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LIKELIHOOD  ESTIMATION  AND  INFERENCE  LN  A  CLASS  OF 
NONREGULAR  ECONOMETRIC  MODELS 

VICTOR  CHERNOZHUKOV  AND  HAN  HONG 


Abstract.  In  this  paper  we  study  estimation  and  inference  in  structural  models  with  a  jump  in 
the  conditional  density,  where  the  location  and  size  of  the  jump  are  described  by  regression  lines. 
Two  prominent  examples  are  auction  models,  where  the  density  jumps  from  zero  to  a  positive 
value,  and  the  equilibrium  job  search  model,  where  the  density  jumps  from  one  level  to  another, 
inducing  kinks  in  the  cumulative  distribution  function.  An  early  model  of  this  kind  was  introduced 
by  Aigner,  Amemiya,  and  Poirier  (1976),  but  the  estimation  and  inference  in  such  models  remained 
an  unresolved  problem,  with  the  important  exception  of  the  specific  cases  studied  by  Donald  and 
Paarsch  (1993a)  and  the  univariate  case  in  Ibragimov  and  Has'minskii  (1981a).  The  main  difficulty 
is  the  statistical  non-regularity  of  the  problem  caused  by  discontinuities  in  the  likelihood  function. 
This  difficulty  also  makes  the  problem  computationally  challenging. 

This  paper  develops  estimation  and  inference  theory  and  methods  for  such  models  based  on 
likelihood  procedures,  focusing  on  the  optimal  (Bayes)  procedures,  including  the  MLEs.  We  obtain 
results  on  convergence  rates  and  distribution  theory,  and  develop  Wald  and  Bayes  type  inference 
and  confidence  intervals.  The  Bayes  procedures  are  attractive  both  theoretically  and  computa- 
tionally. The  Bayes  confidence  intervals,  based  on  the  posterior  quantiles,  are  shown  to  provide 
a  valid  large  sample  inference  method  with  good  small  sample  properties.  This  inference  result 
is  of  independent  practical  and  theoretical  interest  due  to  the  highly  non-regular  nature  of  the 
likelihood  in  these  models,  in  which  the  maximum  likelihood  statistic  or  any  finite  dimensional 
statistic  is  not  asymptotically  sufficient. 
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1.    Introduction 

This  paper  develops  theory  for  estimation  and  inference  methods  in  structural  models  with  jumps 
in  the  conditional  density,  where  the  locations  of  the  jumps  are  described  by  parametric  regression 
curves.  The  jumps  in  the  density  are  very  informative  about  the  parameters  of  these  curves,  and 
result  in  non-regular  and  difficult  inference  theory,  implying  highly  discontinuous  likelihoods,  non- 
standard rates  of  convergence  and  inference,  and  considerable  implementation  difficulties.  Aigner, 
Amemiya,  and  Poirier  (1976)  proposed  early  models  of  this  type  in  the  context  of  production  analy- 
sis. Many  recent  econometric  models  also  share  this  interesting  structure.  For  example,  in  structural 
procurement  auction  models,  cf.  Donald  and  Paarsch  (1993a),  the  conditional  density  jumps  from 
zero  to  a  positive  value  at  the  lowest  cost;  in  equilibrium  job  search  models  (Bowlus,  Neumann, 
and  Kiefer  (2001)),  the  density  jumps  from  one  positive  level  to  another  at  the  wage  reservation, 
inducing  kinks  in  the  wage  distribution  function.  In  what  follows,  we  refer  to  the  former  model  as 
the  one-sided  or  boundary  model,  and  to  the  latter  model  as  the  two-sided  model.  In  these  models, 
the  locations  of  the  jumps  are  linked  to  the  parameters  of  the  underlying  structural  economic  model. 
Learning  the  parameters  of  »the  location  of  the  jumps  is  thus  crucial  for  learning  the  parameters  of 
the  underlying  economic  model. 

Several  early  fundamental  papers  have  developed  inference  methods  for  several  cases  of  such  mod- 
els, including  Aigner,  Amemiya,  and  Poirier  (1976),  Ibragimov  and  Has'minskii  (1981a),  Flinn  and 
Heckman  (1982),  Christensen  and  Kiefer  (1991),  Donald  and  Paarsch  (1993a,  1993b,  1996,  2002), 
and  Bowlus,  Neumann,  and  Kiefer  (2001).  Ibragimov  and  Has'minskii  (1981a)  (Chapter  V)  ob- 
tained the  limit  theory  of  the  likelihood-based  optimal  (Bayes)  estimators  in  the  general  univariate 
non-regression  case,  and  obtained  the  properties  of  MLE  in  the  case  of  one-dimensional  parameter, 
van  der  Vaart  (1999)  (Chapters  9.4-9.5)  discussed  the  limit  theory  for  the  likelihood  in  the  univari- 
ate Uniform  and  Pareto  models,  including  Pareto  models  with  parameter-dependent  support  and 
additional  shape  parameters.  Paarsch  (1992)  and  Donald  and  Paarsch  (1993a,  1993b,  1996,  2002) 
introduced  and  developed  the  theory  of  likelihood  (MLE)  and  related  procedures  in  the  one-sided 
regression  models  with  discrete  regressors,  demonstrated  the  wide  prevalence  of  such  models  in 
structural  econometric  modeling,  and  stimulated  further  research  in  this  area. 

Nevertheless,  the  general  inference  problem  posed  by  Aigner,  Amemiya,  and  Poirier  (1976)  has 
remained  unsolved  previously.  Very  little  is  known  about  likelihood-based  estimation  and  inference 
in  the  general  two-sided  regression  model.  In  the  general  one-sided  regression  model,  the  problem 
of  likelihood-based  estimation  and  inference  also  remains  an  important  unresolved  question,  an 
important  exception  being  the  MLE  theory  for  discrete  regressors  developed  by  Donald  and  Paarsch 
(1993a).1  The  general  theory  of  such  regression  models  is  more  involved  and  has  a  substantively 
different  structure  than  the  corresponding  theory  for  the  univariate  (non-regression)  or  dummy 


'There  is  also  a  literature  on  the  ad  hoc  "linear  programming"  estimators  of  linear  boundary  functions  covering 
continuous  covariate  case,  see  e.g.    Smith  (1994)  (Chernozhukv  (2001)  provides  a  detailed  review  and  other  related 


regressor  case.2  Moreover,  there  is  a  considerable  implementation  problem  caused  by  the  inherent 
computational  difficulty  of  the  classical  (maximum  likelihood)  estimates. 

This  paper  offers  solutions  to  these  open  questions  by  providing  theory  for  estimation  and  inference 
methods  in  both  one  and  two-sided  models  with  general  regressors.  These  methods  rely  on  the 
likelihood-based  optimal3  Bayes  and  also  the  MLE  procedures.  This  paper  demonstrates  that  these 
are  tractable,  computationally  and  theoretically  attractive  ways  to  obtain  parameter  estimates, 
construct  confidence  intervals,  and  carry  out  statistical  inference.  These  results  cover  Bayes  type 
inference  as  well  as  Wald  type  inference. 

We  show  that  Bayes  inference  methods,  based  on  the  posterior  quantiles,  are  valid  in  large  samples 
and  also  perform  well  in  small  samples.  Moreover,  these  inference  methods  are  tractable  and  require 
no  knowledge  of  asymptotic  theory  on  the  practitioner's  part.  These  estimation  methods  are  also 
attractive  due  to  their  well-known  finite-sample  and  large  sample  average  risk  optimality.  They 
are  computationally  attractive  when  carried  out  through  the  Markov  Chain  Monte  Carlo  procedure 
(MCMC),  see  e.g.  Robert  and  Casella  (1998),  which  helps  avoid  the  inherent  curse  of  dimensionality 
in  the  computation  of  the  MLE.  •• 

All  of  these  results  are  preceded  by  a  complete  large  sample  theory  of  likelihood  for  these  models, 
which  is  useful  not  only  for  the  present  analysis  but  also  for  any  kind  of  inference  based  on  the 
likelihood  principle.  Importantly,  we  show  that  the  MLE  is  generally  not  an  asymptotically  sufficient 
statistic  in  these  models  (in  contrast  to  the  non-regression  case  or  dummy  regressor  case).  Therefore, 
the  likelihood  contains  more  information  than  the  MLE  does,  and  the  totality  of  likelihood-based 
procedures  are  generally  not  functions  of  the  MLE  asymptotically,  as  they  are  in  the  non-regression 
or  dummy  regressor  case  (or  regular  models).  This  motivates  the  study  of  the  entire  likelihood  and 
the  wide  class  of  the  likelihood-based  procedures. 


results).  In  some  special  cases,  such  as  homoscedastic  exponential  linear  regression  models,  these  estimators  coincide 
with  the  MLE  asymptotically. 

2In  fact,  we  show  in  this  paper  that  unlike  in  the  univariate  models,  such  as  the  Uniform  and  Pareto  models 
discussed  in  details  by  van  der  Vaart  (1999)  or  dummy  regression  case,  there  are  no  finite-dimensional  sufficient 
statistics.  MLE  is  not  asymptotically  sufficient  either,  making  inference  theory  difficult  to  analyze.  We  show  that  the 
limit  likelihoods  depend  on  multivariate  Poisson  point  processes  with  complex  correlation  structure. 

3This  terminology  follows  that  of  Berger  (1993),  p.  17. 


Our  work  is  also  related  to  a  recent  important  contribution  by  Hirano  and  Porter  (2002).  They 
provide  a  detailed  analysis  of  asymptotic  minimax  efficiency  in  a  class  of  boundary  models.4  They 
employ  an  exponential-shift  experiment  framework  along  with  group  analysis  to  generate  new  results 
and  insights  on  the  efficiency  structure  of  Bayesian  estimators  (which  also  motivate  the  present 
research)  and  prove  the  inefficiency  (sub-optimality)  of  the  MLE  under  the  common  mean  squared 
and  absolute  deviation  criteria.  We  study  a  different  set  of  questions  -  focusing  on  the  estimation 
and  inference  problem  in  the  general  two-sided  and  boundary  models. 

We  briefly  summarize  the  contributions  of  this  paper  as  follows.  First,  we  derive  the  large  sample 
behavior  of  the  likelihood  ratio  process  and  show  that  it  approaches  a  simple,  explicit  function  of 
a  Poisson  process  that  tracks  the  extreme  (near-to-jump)  events  and  depends  on  regressors  in  an 
interesting  way.  This  limit  result  is  useful  for  any  inference  that  relies  on  the  likelihood  principle. 
The  limit  is  useful  since  it  can  be  easily  simulated  in  order  to  evaluate  the  limit  distributions  of 
derived  estimators  and  various  likelihood-based  statistics.  To  our  knowledge,  these  results  are  new. 

Second,  we  prove  the  consistency,  derive  the  rates  of  convergence,  and  provide  the  limit  distribu- 
tion of  the  likelihood-based  optimal  estimators  (BE)  and  MLE.  The  results  are  basic  prerequisites 
for  using  these  estimators  in  empirical  work.  More  importantly,  these  results  justify  general  Wald 
type  inference  based  on  limit  distribution,  subsampling,  and  parametric  bootstrap. 

Third,  we  show  that  posterior  r-quantiles  are  asymptotically  (1  —  r)-quantile  unbiased  estimators 
of  the  true  parameters.  This  property  implies  the  validity  of  Bayes  type  confidence  intervals  based 
on  the  posterior  quantiles.  These  confidence  intervals  provide  valuable  practical  inference  meth- 
ods since  they  are  simple  to  implement  and  require  no  detailed  knowledge  of  asymptotic  theory. 
This  frequentist  validity  result  is  also  of  general  theoretical  interest  because  it  covers  models  with 
complicated  likelihoods  where  no  finite  dimensional  sufficient  statistics  exist  asymptotically,  and  its 
proof  applies  more  generally  to  other  problems.  We  further  generalize  this  result  to  cover  Bayes 
inference  about  general  smooth  functions  of  parameters,  and  show  that  it  provides  valid  inference 
asymptotically. 

Fourth,  we  briefly  discuss  how  the  well-known  finite-sample  (average-risk)  optimality  of  Bayes 
procedures  carries  over  to  the  limit.0    The  discussion  is  auxiliary  and  given  here  to  prove  some 

4Hirano  and  Porter  (2002)  also  derive  the  limit  distributions  of  BE's  for  continuous  covariate  boundary  models  in  a 
different  form.  (As  Hirano  and  Porter  (2002)  note,  the  present  treatment  of  continuous  covariates  appears  to  precede 
theirs  somewhat.)  An  important  difference  is  that  our  limit  likelihood  is  stated  in  terms  of  a  simple  transform  of  a 
Poisson  process,  hence  it  is  quite  simple  to  simulate  for  inference  purposes  (which  is  our  focus).  Hirano  and  Porter 
(2002)'s  limit  is  implicit,  given  as  a  process  indexed  by  continuous  covariate  values.  Consequently,  their  result  is  more 
suited  for  efficiency  analysis  (which  is  their  focus),  and  its  use  for  classical  inference  in  practice  may  be  infeasible 
without  the  (equivalent)  Poisson  type  representations  obtained  in  this  paper.  Also,  the  present  paper  focuses  on 
inference  in  both  one-  and  two-sided  models. 

5The  optimality  of  Bayes  estimates  is  treated  in  considerable  details  elsewhere  in  the  literature.  The  recent 
contribution  by  Hirano  and  Porter  (2002)  provides  a  detailed  limit-of-experiments  analysis  for  a  class  of  boundary 
models.    Lehmann  and  Casella  (1998)  provides  a  basic  discussion,    van  der  Vaart  (1999),  Chapter  9.3-9.4,  treats 
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of  the  previously  stated  results  and  for  the  justification  of  BE's,  including  the  approximate  and 
exact  MLEs.  The  exact  MLE,  even  when  bias-corrected,  generally  does  not  coincide  with  optimal 
procedures  asymptotically.  But  it  is  close  to  the  approximate  MLE  defined  as  a  BE  under  any  loss 
function  that  approximates  the  delta  function  such  as  the  0-1  loss  I[\u\  >  e]/e  or  (truncated)  p-th 
power  losses.6  Such  loss  functions  penalize  mistakes  differently  than  the  squared  loss  function.  In 
that  sense,  the  exact  MLEs  is  approximately  optimal  under  any  loss  function  that  approximate  the 
delta  function,  and  may  perform  better  under  the  alternative  loss  functions  than  other  likelihood 
procedures  such  as  posterior  means  or  posterior  medians.  Importantly,  this  implies  that  the  MLE 
generally  can  not  be  dominated  by  any  other  given  BE  when  the  risk  comparisons  are  made  across 
different  loss  functions.  Such  comparisons  are  relevant  when  the  empirical  investigator  does  not 
know  the  loss  function  of  the  end  user  of  her  result.  Thus,  the  MLE  method,  advocated  by  Donald 
and  Paarsch  (1993a)  in  the  context  of  the  discrete-covariate  boundary  models,  provides  a  valuable 
method  for  estimation  and  inference  in  both  the  two-sided  and  one-sided  regression  models.7 

Fifth,  we  show  through  simulation  examples  based  on  an  empirical  auction  model  from  Paarsch 
(1992)  that(l)  particular  BE's  and  MLEs  work  quite  well  and  their  relative  performance  depends 
critically  on  the  measure  of  risk,  and  (2)  the  Bayes  confidence  intervals  and  the  Wald  confidence 
intervals  based  on  the  limit  distributions  perform  as  accurately  as  the  Wald  confidence  intervals  based 
on  the  parametric  bootstrap,  but  are  much  less  expensive  computationally.  The  Bayes  confidence 
intervals  also  produce  the  shortest  confidence  intervals  among  other  methods.  Thus,  this  paper 
justifies  a  whole  array  of  useful  and  practical  inference  techniques,  ranging  from  Wald  type  to  Bayes 
type  inference  methods. 

2.  The  Model,  Examples,  Assumptions,  Procedures 

This  section  describes  the  model  and  provides  an  informal  discussion  of  the  assumptions,  results, 
and  inference  procedures  developed  in  the  later  sections  of  the  paper. 

2.1.  The  Model.  It  is  convenient  to  describe  the  class  of  models  we  consider  in  terms  of  a  regression 
model  where  the  errors  have  a  discontinuous  density.  Let  (Yj,  Xi)  ,i  =  1, . . .  ,n,  denote  the  random 
iid  sample  of  size  n  generated  by  the  model 

Yi=g(Xi,0)+ei,  (2.1) 


efficiency  jd  the  Uniform  and  Pareto  models  in  the  non-regression  case.  Ibragimov  and  Has'minskii  (1981b),  p.93 
prove  a  fundamental  result  on  the  generic  asymptotic  efficiency  of  Bayes  procedures  under  general,  non-primitive 
(hard-to-verify)  conditions. 

The  approximations  are  Bayes  procedures  and  are  optimal  in  that  regard,  and  hence  can  be  a  good  substitute  for 
exact  MLE. 

Additional  important  and  distinct  properties  of  the  MLE  include  (i)  invariance  to  reparameterization  and  (ii) 
independence  from  prior  information.  Arguably,  both  properties  are  very  useful. 

4 


where  F;  is  the  dependent  variable,  Xj  is  a  vector  of  covariates  that  has  distribution  function  Fx , 
and  the  error  e*  has  conditional  density  /  (e\Xi,/3,a).  The  central  assumption  of  the  model  is  that 
the  conditional  density  of  the  error  /  (e|Xj,  0,  a)  has  a  jump  (or  discontinuity)  normalized  to  be  at 
0,  which  may  depend  on  the  parameters  0  and  a: 


Yimf(e\x,0,a)  =q{x,0,a), 

(TO 

\\mf{e\x,0,a)=p{x,0,a),  (2.2) 

(10 

p(x,0,a)  >q(x,0,a)  +8,    8  >  0,    Vx  €  X  =  support(X),     V(Aq)e6x1 


Hence,  in  this  model  the  location  of  the  discontinuity  in  the  density  of  Y  conditional  on  X  is  given 
by  the  regression  function  g{X,0),  which  is  described  by  the  parameter  0.  Thus,  there  are  two 
sets  of  parameters,  collected  into  a  vector  7  =  (0',a')',  where  0  affects  the  regression  curve  and 
possibly  the  error  distribution  and  a  affects  the  shape  of  the  error  distribution  only.  We  assume 
that  0  e  B  C  Rd/>  and  a  £  Ac  Rda .  We  also  assume  that  the  parameter  set  Q  =  B  x  A  is  compact 
and  convex,  and  that  the  true  parameter  belongs  to  the  interior  of  this  set. 

We  consider  two  models:  the  one-sided  model  and  the  two-sided  model.  In  the  one-sided  model, 
the  conditional  density  jumps  from  zero  to  a  positive  constant.  In  the  two-sided  model,  the  condi- 
tional density  jumps  from  one  positive  value  to  another.  The  one-sided  model  is  a  special  case  of  the 
two-sided  model.  In  addition,  Aigner,  Amemiya,  and  Poirier  (1976)  suggested  that  the  two-sided 
model  may  be  applied  to  one-sided  models  in  the  presence  of  outliers,  using  an  additional  side  to 
model  the  outliers.  More  generally,  the  two-sided  model  approximates  models  with  a  sharp  change 
in  the  density,  where  the  location  of  the  change  depends  on  parameters  and  regressors.  The  finite 
sample  distribution  of  parameter  estimates  in  such  models  is  approximated  by  that  in  the  model 
with  a  density  jump.  The  two-sided  models  also  naturally  arise  in  equilibrium  search  models,  see 
e.g.  Bowlus,  Neumann,  and  Kiefer  (2001). 

The  key  feature  of  the  regression  model  is  that  the  conditional  density  of  Y  given  X  jumps  at 
the  location  g(X,0),  which  depends  on  the  parameter  0  and  covariates  X.  This  feature  generates 
sharp  discontinuities  in  the  likelihood,  which  create  statistical  non-regularities  and  computational 
difficulties.  The  discontinuities  are  highly  informative  about  0  and  imply  estimability  at  rate  n. 
(The  simplest  univariate  example  is  the  uniform  model  U  (0,/3),  where  0  is  estimated  at  the  rate  n). 
On  the  other  hand,  inference  about  a  is  standard  in  many  regards. 

Note  that  classification  of  the  model's  parameters  into  a  and  0  is  motivated  statistically,  as  in 
Donald  and  Paarsch  (1993a)  and  van  der  Vaart  (1999)  (who  considered  univariate  Pareto  models). 
The  boundary  parameters  0  usually  coincide  with  the  main  economic  parameters,  as  indicated 
earlier.  If  they  do  not  and  the  Wald  type  inference  is  to  be  used,  then  one  needs  to  reparameterize 


them  into  a  and  /3,  see  e.g.  Donald  and  Paarsch  (1993a).8  However,  the  practical  use  of  Bayes  type 
inference  or  parametric  bootstrap  methods  do  not  require  such  reparameterization. 

In  the  following,  we  briefly  review  a  structural  example,  which  will  serve  to  illustrate  the  plausi- 
bility of  our  regularity  conditions  and  explain  the  results.  It  also  provides  an  example  for  the  Monte 
Carlo  work. 

Example:  Independent  Private  Value  Procurement  Auction.  Consider  the  following 
econometric  model  of  an  independent  private  value  procurement  auction,  formulated  in  Paarsch 
(1992)  and  Donald  and  Paarsch  (2002).  Here,  Yi  is  the  winning  bid  for  auction  i  and  the  covariates 
Xi  =  (Zi,rrii)  describes  variation  across  auctions,  where  mi  denotes  the  number  of  bidders  in  the 
i-th  auction  minus  1,  and  Z,  denotes  other  observed  characteristics  of  auctions. 

The  bidders'  privately  observed  costs  V  follow  an  iid  Pareto  distribution  given  X,  i.e.  the  density 
of  V  given  X  is  described  by 

fv  (v\x)  =  M£    v  >  e1  >  o,  e2  >  o, 

where  62  and  9\  are  parameterized  as  functions  of  X  and  (3  (but  this  dependence  is  suppressed  for 
notation  convenience).  E.g.  9\  {X,fi)  —  exp(/?(Z)  and  02  (X,P)  =  exp(P'2Z). 

Assuming  the  Bayesian  Nash  Equilibrium  solution  concept,  the  equilibrium  bidding  function 
satisfies 

a(v)  =  v+        (1_Fv{v]x))m       , 

which  is  the  cost  plus  the  expected  net  revenue  conditional  on  winning  the  auction.  Evaluating  a  (v) 
at  v  =  6\  gives  the  conditional  support  for  the  winning  bid.  As  shown  in  Paarsch  (1992),  this  implies 
the  following  density  function  of  the  winning  bid  Y,  which  is  the  first  order  statistic  generated  by 
the  specified  bidding  rule,  conditional  on  covariates  X: 

,   ,,Y/l,M       02m    [M™-D-i]  ./    J^(m-l) 

U  (y\x,eue2)  = ^^ 1  ^  > 


6>2(m-l)-l 


8A  referee  pointed  out  the  following  example.  Suppose  g($)  =  #i 62,  where  62  also  affects  the  shape  of  the 
error  distribution.  Although  this  example  does  not  correspond  to  the  economic  model  we  used  in  the  simulations,  it 
highlights  the  important  issue  of  reparameterization.  In  this  case  the  asymptotic  theory  requires  reparameterization 
into  /3  =  6\02  and  a  =  9\ ,  and  then  the  estimates  of  fli  and  62  are  deduced  from  the  estimates  of  a  and  0,  and  Wald 
type  inference  may  be  carried  out  using  the  Delta  method  (preferably  based  on  the  second  order  expansion,  so  that 
finite-sample  estimation  uncertainty  about  /3  is  not  neglected).  E.g.  for  02  =  PI  a  (02  —  02)  ~  {fi  —  P)/a  —  (a  —  a) /a2  + 
2(6  -q)2/q3  &  n~lZB la  +  n-l/2Za/a2  +  2n_1(Za)2/a3,  where  Zs  and  Za  are  the  limit  distributions  of /3  and 
a.  This  expansion  can  be  important  because  in  finite  samples,  variability  of  estimates  of  /3  may  be  of  comparable  or 
larger  order  than  that  of  6,  motivating  this  expansion.  Of  course,  one  could  use  the  first  order  Taylor  expansion  too, 
(02  —  02)  ~  ~(oc  -  a)/a2  +  op(l)  but  this  approximation  is  less  accurate. 
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Therefore,  this  is  an  example  of  a  one-sided  regression  model  (2.1)  where 

Yi=g(Xi,0)  +  ei, 
with 

g(X,0)  =  91(X,0)  ■  92(X,  0)  ■  (m  -  1)  /(92{X,  0)  (m  -  1)  -  1), 
and  ti  has  density  f(e\X,0)  =  fY(g(X,0)  +  e\X,0)  conditional  on  X. 

2.2.  Regularity  Conditions.  The  main  regularity  conditions  C0-C5  are  collected  in  Appendix 
A.  They  serve  to  impose  five  basic  types  of  assumptions: 

(a)  identification  and  compact,  convex  parameter  space  (with  true  parameters  in  the  interior), 

(b)  continuous  differentiability  of  the  regression  function  g{x;0)  in  0, 

(c)  nondegeneracy  and  boundedness  of  the  vector  dg  (X,0)  /d0, 

(d)  continuous  differentiability  and  boundedness  of  the  density  function  /(e|x,7),  of  its  partial 
first  derivatives  in  7  and  €,  and  of  the  second  partial  derivatives  in  7(except  at  e  =  0). 

(e)  continuous  differentiability  and  integrability  of  the  first  and  the  second  partial  derivatives 
of  In/  (e|a;,7)  in  7. 

Conditions  of  types  (a)  -  (c)  are  standard  in  nonlinear  likelihood  analysis.  Smoothness  condi- 
tions of  type  (d)  represent  a  generalization  of  the  conditions  of  Ibragimov  and  Has'minskii  (1981a). 
Conditions  of  type  (e)  are  the  standard  conditions  for  regular  smooth  likelihood  models,  e.g.  as  in 
van  der  Vaart  (1999),  Chapter  7.  Conditions  of  type  (e)  reflect  that  inference  about  a  is  standard 
if  0  is  known. 

These  conditions  are  flexible  enough  to  cover  various  auction  models,  frontier  production  function 
models,  and  equilibrium  search  models.9 

2.3.  Definitions  of  Estimation  Procedures  and  Informal  Overview  of  Results.  Define  the 
likelihood  function  as  10 

Lnil)  =  II  /W-*(*<,W«;7).  (2.3) 

•  <n 

The  optimal  Bayes  estimators  are  the  likelihood-based  estimators  that  minimize  the  average 
expected  risk,  where  the  risk  is  computed  under  different  parameter  values  and  then  averaged  over 


A  technical  addendum  gives  an  example  of  verification  of  these  conditions  in  the  auction  model  that  underlies 
our  Monte-Carlo  simulations. 

I0The  likelihood  can  be  made  unconditional  by  multiplying  through  with  the  density  (probability  mass)  function 
of  {X,,i  <  n).  This  term  is  omitted  because  this  additional  term  does  not  affect  the  definition  of  the  likelihood  ratio 
or  can  be  otherwise  canceled  out. 


these  parameter  values.  The  procedures  are  generally  of  the  following  form: 

7  =  arginf  [  p„(7  -  7),   ^fji-d),  (2.4) 

where  pn  (7)  =  p(n/3,y/na)  is  a  loss  function,  p(-)  is  the  weight  density  function  (prior  density)  on 
Q,  and  in(7)A*(7)//c  ^n(T')/i(7')^7'  ls  tne  posterior  density.  The  optimality  properties  of  the  Bayes 
procedures  carry  over  to  the  limit.11 

The  loss  function  pn  is  made  explicitly  dependent  on  the  sample  size  for  purposes  of  asymptotic 
analysis,  as  in  Ibragimov  and  Has'minskii  (1981b),  but  this  may  be  ignored  in  practice.  Convexity 
and  standard  conditions  are  imposed  on  the  loss  function  p  and  the  prior  p,  and  collected  as  D1-D3 
in  Appendix  A.  Examples  of  such  loss  functions  include 

(A)  p  (z)  =  z'z,  a  quadratic  loss  function, 

(B)  p  (z)  =  Sj_i  \zj\,  an  absolute  deviation  loss  function, 

(C)  p  (z;  t)  =  Y.%1  I1  (zj  >  °)  ~  T)  z3>  T  e  (°>  !)>  a  variant  of  the  Koenker  and  Bassett  (1978) 
check  deviation  loss  function, 

Solutions  of  (2.4)  with  loss  functions  (A),  (B),  (C)  generate  BEs  7  that  are,  respectively,  (a)  a  vector 
of  posterior  means,  (b)  a  vector  of  posterior  medians  (for  each  parameter  component),  (c)  a  vector 
of  posterior  r-th  quantiles. 

Since  BEs  become  very  difficult  to  compute  when  p  is  not  convex,  we  focus  on  convex  loss 
functions  for  pragmatic  reasons.  However,  proofs  of  the  main  results  apply  more  generally  to  other 
loss  functions  specified  in  Ibragimov  and  Has'minskii  (1981b).  In  practice,  7  can  be  computed  using 
Markov  Chain  Monte  Carlo  methods,  which  produce  a  sequence  of  draws 


(7(1,,...,7(>)),  (2-5) 


whose  marginal  distribution  is  given  by  the  posterior.  Appropriate  statistics  of  that  sequence  can  be 
taken  depending  on  the  choice  of  p.  (E.g.  the  means  or  component-wise  medians  for  cases  (A)  and 
(B)  above.)  More  generally,  estimators  7  are  solutions  of  well  defined  globally  convex  differentiable 
optimization  problems.12 

The  computational  attractiveness  of  estimation  and  inference  based  on  the  Bayes  procedures  stems 
from  the  use  of  Markov  Chain  Monte  Carlo  (MCMC)  and  the  statistical  motivation  of  definition 
of  the  Bayes  procedures.  Since  the  Bayes  estimates  and  the  interval  estimates  are  typically  means, 
medians,  or  quantiles  of  the  posterior  distribution,  by  drawing  the  MCMC  sample  of  size  6  from  the 
posterior  distribution,  we  can  compute  these  quantities  with  an  accuracy  of  order  1/vb.  In  contrast, 


Furthermore,  another  motivation  for  (2.3)  is  that  any  optimal  (admissible)  estimation  procedure  is  a  Bayes 
procedure  or  a  Bayes  procedure  with  improper  priors,  cf.  Wald  (1950). 

1  Given  the  MCMC  series  (2.5)  7  solves  arginf76g  j  X]<=i  P"  [f  ~  7'4')>  which  is  a  globally  convex  and  smooth 
(if  pn  is  smooth)  optimization  problem. 
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the  computation  of  exact  MLE  requires  optimization  of  a  highly  non-convex,  discontinuous  and 
otherwise  highly  nonlinear  likelihood.  MLE  can  be  estimated  by  grid-based  algorithms  or  MCMC 
only  with  an  accuracy  that  worsens  exponentially  in  the  parameter  dimension. 

The  BEs  and  the  MLE  are  consistent  and  it  is  shown  in  this  paper  that 

0-0  =  Op  (n-1 )  and  5  -  a  =  Op{n-1'2).  (2.6) 

The  BE's  are  shown  to  converge  in  distribution  to  Pitman13  functionals  of  the  limit  likelihood  ratio 
process.  We  first  develop  a  complete  large  sample  theory  of  likelihood  for  these  models,  which 
is  a  prerequisite  for  any  inference  based  on  the  likelihood  principle.  In  particular,  we  obtain  an 
explicit  form  of  the  limit  likelihood  ratio  process  as  a  function  of  a  Poisson  process  that  can  be 
easily  simulated. 

This  result  implies  that  the  limit  distributions  of  the  estimators  can  be  simulated  for  purposes  of 
Wald  type  inference  through  either  (a)  simulation  of  the  limit  likelihood  process,  or  (b)  resampling 
techniques  including  subsampling  and  parametric  bootstrap.  Subsampling  may  be  more  robust  than 
other  methods  under  local  misspecification  of  the  parametric  assumptions.  However,  the  resampling 
methods  are  much  more  computationally  expensive  and  require  much  more  computational  time  than 
Bayes  type  inference.  Simulating  the  limit  distribution  is  comparable  in  terms  of  the  computational 
expense  to  Bayes  inference  due  to  the  linearity  of  the  limit  process. 

An  attractive  practical  alternative  is  the  Bayes  inference  based  on  the  posterior  quantiles.  Our 
results  establish  its  large  sample  frequentist  validity.  Consider  constructing  a  r  x  100%  confidence 
intervals  for  rn  (7),  where  rn  is  a  smooth  real  function  that  possibly  depends  on  n.  Define  the  r-th 
posterior  quantile  of  the  posterior  distribution  as 

c(t)  =  arginf  f  p  (f  -  r„  (7)  I r)        ^^      ,d7,  (2.7) 

where  p{z;r)  is  the  check  function  defined  above,  and  TZn  =  {r„(7),7  €  Q).  In  practice,  c(t)  is 
computed  taking  the  rth-quantile  of  the  MCMC  sequence  evaluated  at  rn 

(r,i(7(1)),-,rn(7(6)))-  (2.8) 

The  resulting  r  x  100%-confidence  intervals  are  given  by 

[c(r/2),c(l-r/2)],  where    Kn^P^cir/Z)  <  r„  (70)  <  c(l  -t/2)}  =  r,  (2.9) 

under  mild  conditions  on  r„,  which  is  one  of  the  main  results  of  this  paper. 

A  pragmatic  motivation  for  Bayesian  intervals  is  that  the  empirical  researcher  does  not  need  to 
have  detailed  knowledge  of  complex  asymptotic  limit  theory  to  apply  them.  She  can  simply  compute 
the  intervals  through  generic  MCMC  methods,  and  then  rely  upon  the  present  results  that  establish 
the  large  sample  frequentist  validity  of  these  intervals. 
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We  follow  the  terminology  of  Ibragimov  and  Has'minskii  (1981b)  p.  21. 


Another  classical  procedure  is  the  MLE,  which  is  defined  by  maximizing  the  likelihood  function: 

7  =  0',  a')'  =  axgsup  Ln  (7) . 

The  MLE  is  a  limit  of  BEs  under  any  sequence  of  loss  functions  that  approximates  the  delta  functions. 
We  shall  only  briefly  discuss  the  limit  distribution  of  exact  MLE  for  editorial  reasons.  A  detailed 
analysis  of  MLE  is  given  in  the  technical  report,  cf.  Chernozhukov  and  Hong  (2003).  The  MLE 
Converges  in  distribution  to  a  random  variable  that  maximizes  the  limit  likelihood  ratio. 

3.  Large  Sample  Theory 

This  section  contains  the  main  formal  results  of  the  paper.  Section  3.1  examines  the  large  sample 
properties  of  the  likelihood  ratio  function.  Characterization  of  the  limiting  behavior  of  the  likelihood 
is  necessary  for  obtaining  all  of  the  main  results  and  is  useful  for  any  likelihood  based  inference 
methojds.  Section  3.2  provides  an  intuitive  discussion  of  this  result  and  subsequent  results  through 
an  example.  Section  3.3  describes  the  large  sample  properties  of  optimal  Bayes  estimators  and  both 
Wald  and  Bayes  type  inference  procedures.  Section  3.5  briefly  discusses  the  limit  theory  of  exact 
MLE. 

3.1.  Large  Sample  Theory  for  the  Likelihood.  A  common  first  step  in  modern  asymptotic 
analysis  is  to  find  the  finite-dimensional  marginal  limit  of  the  likelihood  ratio  process  or  other 
criterion  functions,  e.g.  van  der  Vaart  (1999)  and  Knight  (2000).  After  appropriate  strengthening, 
the  limit  serves  to  describe  the  asymptotic  distribution  of  all  likelihood  based  estimators.  Such  an 
initial  step  is  sometimes  called  the  convergence  of  experiments,  see  van  der  Vaart  (1999). 

Consider  the  local  likelihood  ratio  function 

en(z)  =  Ln(nn(5)  +  Hnz)/Ln{ln{$)), 

where  7n(<5)  =  70  +  HnS  denotes  the  true  parameter  sequence.  6  €  Rd  and  Hn  is  a  diagonal 
matrix  with  1/n  in  the  first  dp  —  dim  (/5)  diagonal  entries  and  l/y/n  in  the  remaining  da  =  dim  (a) 
diagonal  entries.  Consideration  of  the  local  parameter  sequence  is  necessary  for  subsequent  results. 
The  scaling  by  Hn  corresponds  to  the  convergence  rates  y/n  for  a  and  n  for  j3.14 

The  function  £„(z)  is  said  to  converge  in  distribution  to  Eoa (z)  in  finite-dimensional  sense  if  for 
any  finite  k 

(*»(**)'    3<k)->d  (4o(*i)>    3<k),  (3.1) 

and  ^oo(-)  is  called  a  finite-dimensional  limit.  In  this  section,  ->,j  denotes  convergence  in  distribution 
under  Pln(S)-  We  partition  the  localized  parameter  z  accordingly  into  z  =  (u',v')  ,  where  u  £  Rdff 


The  convergence  rates  are  established  as  parts  of  the  proof  of  the  subsequent  theorems,  and  follow  from  the 
exponential  decay  of  the  likelihood  tails  E£n    (z)  ~  const     e_c'z'  as  \z\  — >  00,  see  the  proof  of  Theorem  3.2. 
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corresponds  to  the  localized  location  parameters  and  v  €  Rda  corresponds  to  the  localized  shape 
parameters. 

Theorem  3.1  (Limits  of  the  Likelihood  Function).  Given  conditions  C0-C5  collected  in  Ap- 
pendix A,  the  finite- dimensional  weak  limit  of  the  likelihood  ratio  process  tn  (z)  takes  the  following 
form:  for  A(x)  =  dg(x,0o)/dP,  p{X)  =  p{X,-y0),  q{X)  =q{X,^0),  and /*  (7)  =\nf(ei\Xi,'y), 

ex  (z)  =  el0O{v)  x  £2oo(w), 

11oo(v)  =  exp  (W'v  -  v'Jv/2) ,  (3  2) 

i2oo{u)=  exp(u'm+   /        lu(j,x)dN(j,x)), 

V  JlixX  ' 

where  J  =  E„io  (£i,  (7o)  £h  (lb)'),  m  =  EPyo  A(X)\p(X)  -  q(X)},  W  A  TV  (0,  J),  and 

lu(j,x)  =  \n?P-l[0  <  j  <  A(x)'  u]+\n?P-l[0>  j  >  A(x)'  u]  , 
p(x)  q(x) 

[where  Ibragimov  and  Has'minskii  (1981b) 's  convention  applies  to  the  case  when  q(x)  =  0:  InO  = 
— oo  ,  lnoo  =  oo  and  1/0  =  co,  oo  •  0  =  0,  see  equation  (3.6)  below]. 

N  is  a  Poisson  random  measure  N(-)  =  Y^\  1  [W>-*t')  e  "]  +  SSi  1  Wn^i)  e  ']>  where 

Jt  =  Tt/p(X,),  Ti=       S1  +  ...  +  Ei,       i>\  (3.3) 

Jl=rjq{Xl),  T'i  =  -(£[  +  .. .  +  £<),       I>1  (3.4) 

{Xi,£i,i  >  1}  is  an  iid  sequence  of  variables  where  Xi  follows  law  Fx,  and  £,  is  a  unit  exponential 
variable.  {X!,£-,i  >  1}  is  an  independent  copy  of  {Xj,£j,i  >  1},  and  both  sequences  are  independent 
o/W. 

Remark  3.1  (Alternative  Form).  To  analyze  the  limit  ^2oo(u)  further,  write  the  Lebesgue  integral 
fRxXlu(j,x)dN(j,x)  appearing  in  the  statement  of  Theorem  3.1  as 

OO  OO  OO  /  -v,  v 

£*««,*)  +^(Ji,X!)  =  £>  Syil  [0  <  J,  <  A  [X,)'u] 

(3.5) 


+  £l*fSM°>  4  >*(*/)'*], 


which  is  a  simple  function  of  the  variables  {X^X-,  Jj,  J-}.  This  suggests  that  the  limit  likelihood 
function  can  be  simulated  simply  by  generating  sequences  of  {X<, X-,  J,-,  J-,i  <  b}  according  to  the 
distributions  specified  in  Theorem  3.1  for  large  b,  and  then  evaluating  the  corresponding  expressions. 
In  practice,  the  quantities  p  (Xi)  and  q  (Xi)  are  replaced  by  their  estimates,  and  Fx  is  replaced  by 
the  empirical  distribution  function.  This  replacement  is  permissible  for  purposes  of  large  sample 
inference,  see  e.g.  Chernozhukov  (2001). 
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Remark  3.2  (Boundary  Case).  There  is  a  drastic  simplification  of  £200  (u)  in  the  one-sided 
(boundary)  model.  Since  q(X)  =  0  a.s.,  using  the  rules  stated  in  Theorem  3.1 


£>(./„ *,)  +X>GM)  =  J2  -°°  ■ 1  [°  <  J'  <  A  (*»)'«] 


i=i  1=1  1=1 


=0 

0,  if  J4  >  A  (^i)'u,  for  all  i  >  1, 
—00,  otherwise. 


(3.6) 


"{ 


Hence  for  m  =  EA(X)p{X) 


,  .        I    exp(u'm)  if  Jj 

(u)  =  < 

\        0 


i >  A(Xi)'u,  for  alii  >  1 

2oo(u)=S         „  .        .  (3.7) 

otherwise. 


Thus  in  the  one-sided  models,  the  limit  depends  only  on  the  set  of  variables  in  (3.3)  and  does  not 
depend  on  the  variables  in  (3.4). 

Remark  3.3  (Robustness  to  Misspecification) .  It  is  not  difficult  to  observe  from  the  proof  of 
Theorem  3.1  (hence  Theorems  3.2-3.4)  that  the  limit  theory  for  /?  is  robust  under  local  misspeci- 
fication of  the  regression  function  g{x,0)  of  order  o(l/n),  and  local  misspecification  of  the  height 
of  densities  at  the  jump  points  p{x,-y)  and  q(x,  7)  of  order  o(\/ri).  It  also  appears  that  the  qual- 
itative nature  of  the  limit  theory  would  be  preserved  under  local  0(l/n)-  misspecification  of  the 
regression  function  g(x,fi)  and  possibly  under  0(1)  misspecification  on  p{x,~f)  and  q(x,  7)  as  long  as 
p{x,f)  >  (7(2,7)  for  all  x.  Given  these  mentioned  conditions  hold,  the  inference  about  a  appears  to 
be  robust  up  to  local  o(l)-  violations  of  the  information  matrix  equality  for  a.  A  formal  development 
of  these  results  is  beyond  the  scope  of  this  paper. 

Theorem  3.1  extends  the  results  of  Donald  and  Paarsch  (1993a)  on  the  boundary  models  with 
discrete  regressors  and  the  results  of  Ibragimov  and  Has'minskii  (1981a)  on  the  univariate  models. 
Despite  its  unusual  form,  the  limit  likelihood  has  a  simple  structure.  The  term  ^ioo(v)  is  a  standard 
expression  for  the  limit  likelihood  ratio  in  regular  models,  and  inference  about  the  shape  parameter 
a  is  thus  asymptotically  regular.  The  limit  log-likelihood  has  a  standard  linear-quadratic  expression: 

t/W  -  v'Jv/2, 

This  limit  contains  a  normal  vector  W  =  N  (0,  J)  and  the  information  matrix  J .  This  implies 
for  example,  that  conventional  estimators  of  a,  such  as  the  posterior  mean  and  the  MLE,  have  the 
standard  limit  distribution 

j-1w  =  jv(o,j-1). 

Because  0  is  unknown,  the  limit  likelihood  also  includes  a  nonstandard  term  ^2oo(u).  The  discon- 
tinuities in  the  density  are  highly  informative  about  /?  and  are  of  a  local  nature.  A  lot  of  information 
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about  P  is  contained  in  the  observations  YJ  that  are  near  the  location  of  the  discontinuity  g(Xi,P), 
that  is,  for  those  Yi  such  that 

ei=Yi-g{Xi,P) 

is  close  to  zero.  Thus  the  behavior  of  extreme  (closest  to  zero)  e[s  determines  the  behavior  of 
^200  (u),  as  further  explained  in  Section  3.2.  Consequently,  one  expects  that  the  rate  of  convergence 
of  likelihood-based  estimators  will  be  n  for  0  (in  contrast  to  ^/n  for  a),  and  that  the  behavior  of 
likelihood  estimators  of  ft  will  be  determined  by  £2oo(")- 

3.2.  Informal  Explanation  Through  an  Example.  Consider  a  simple  model15 

Yi=X'i0o  +  ei,    ei  =  E,  (3-8) 

where  £  is  a  standard  unit  exponential  variable.  This  is  a  boundary  model  with  the  density  at  the 
boundary  equal  to  p{X)  =  1.  Assume  that  there  are  no  shape  parameters  a  (We  do  not  discuss  the 
inference  about  a  as  it  is  regular  as  stated  earlier).  The  model  is  a  linearized,  homoscedastic  version 
of  more  realistic  nonlinear  models. 

Intuitively,  the  smallest  values  of  tj  will  be  most  informative  about  P,  as  the  likelihood  function 
will  be  positive  only  if  Yi  —  X[p  >  0,  for  all  i,  that  is,  when  ne*  >  X'^P  —  /3n),  for  all  i.  Letting 
z  =  n(P  —  Po),  this  constraint  takes  the  form 

nti  >  X[z,  for  all  i. 

What  we  can  learn  about  the  parameter  /?o  will  depend  on  these  constraints. 

The  likelihood  for  this  example  is  Ln{p)  =  Y\ie-<i+x'^-M\{nei  >  X[n(P  -  p0)).  Hence  the 
likelihood  ratio  Ln  (P)  / Ln  (P0)  as  a  function  of  z  =  n(P  —  Pq)  takes  the  form 

M*)  =  II  (e-e'+X'*/n/e-<-)  1  (nei  >  X'z), 

i<n 

which  further  reduces  to 

in(z)  =  ex'zl(n€i     >  X'fz,  for  all  i).  (3.9) 

Since  X-*p  EX,  the  behavior  of  En{z)  for  fixed  z  is  determined  by  the  lowest  order  statistics 

"«(1).    ««(2))    "f(3)> 

The  Reny  representation,  see  e.g.  Embrechts,  Kliippelberg,  and  Mikosch  (1997)  p.  189,  allows  these 
re-scaled  order  statistics  to  be  represented  almost  surely  as 

n,  n  n 

£\,    £\  H 7^2    ,    t\  H -Si  -\ xt3,..., 

n—1  n  —  1  n  —  2 


We  thank  a  referee  for  suggesting  using  a  similar  example. 
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where  {£i,£2,...,£n}  is  an  iid  sequence  of  unit-exponential  variables.  For  given  z,  essentially  only  a 
stochastically  bounded  number  of  order  statistics,  say  fc,  matters  in  the  constraints  (3.9).  Hence  as 
n  —¥  co,  for  any  finite  A; 

k 

(ne(i),  7i€(2)  ,—,ne(k))-*d  {£1,    £\  +  £2    ,    ■■-,    /]  £j) 

3=1 

=  (rl5      r2    ,    ... ,     rfc). 

Hence  the  marginal  limit  of  in{z)  may  be  seen  as 

£«,(«)  =  eE(x)'zl{Ti     >  X[z,  for  all  i  >  1). 

where  {I\}  is  the  sequence  of  gamma  variables  defined  above,  and  Xi  is  the  iid  sequence  of  regressors 
with  distribution  Fx-  Note  that  this  is  just  a  special  case  of  the  limit  stated  in  equation  (3.3),  where 
p(X)  =  1.  (Also  there  are  no  nuisance  parameters  in  this  example  so  that  £tx{z)  =  ^2oo(w).)  The 
definition  of  points  (1^,  Xi)  is  a  special  case  of  points  (J{,Xi)  stated  in  equation  (3.3).  The  use  of 
point  process  methods  in  Theorem  3.1  formalizes  the  intuition  described  above  and  extends  it  to 
more  general  heteroscedastic  errors. 

The  result  stated  in  Theorem  3.1  is  more  complicated  for  the  following  reasons: 

1.  In  more  general  two-sided  models,  there  is  also  an  additional  negative  error  in  equations  like 
(3.8).  The  information  about  /?  is  then  largely  deduced  from  the  e^s  closest  to  0  from  above  and  the 
e;'s  closest  to  zero  from  below.  This  explains  the  presence  of  the  additional  set  of  gamma  variables 
and  associated  regressors  in  equation  (3.4)  as  the  limit  distributions  of  "extremes  from  below". 

2.  The  density  of  e*'s  may  vary  near  zero,  which  changes  the  hazard  rates  of  the  limit  gamma 
variables  Tj  and  r';,  resulting  in  their  division  by  varying  hazard  functions  p{Xj)  or  q{X-). 

3.  Uncertainty  about  the  additional  shape  parameter  a  leads  to  the  presence  of  an  additional 
term  Eioo(v)-  The  form  of  this  term  reflects  that  the  inference  about  a  is  fully  regular.  The  limit 
information  about  a  is  given  by  the  limit  average  score  W  and  the  information  matrix  J .  Since 
information  about  /3  comes  from  a  small  portion  of  the  entire  sample  and  is  based  on  extreme 
type  statistics,  the  average  score  W  is  independent  of  those  statistics  asymptotically.  This  follows 
from  the  standard  proof  of  asymptotic  independence  of  sample  averages  of  general  form  and  sample 
minimal  order  statistics,  see  e.g.  Resnick  (1986)  for  a  general  treatment  and  van  der  Vaart  (1999) 
Lemma  21.19  for  a  simple  example. 

3.3.  Large  Sample  Properties  of  Bayes  Procedures.  Given  the  above  discussion,  the  following 
Theorem  3.2  can  be  easily  conjectured.  The  Bayes  estimator  Z„  =  (n(/3  —  /?„  (<5))',  -Jn  (2  —  a„  (5))')' 
centered  at  the  true  parameter  7„  (6)  =  (/?„  (S)  ,  q„  (5)  )  and  normalized  by  the  convergence  rates, 
is  related  to  the  localized  likelihood  ratio  £n  (z)  as  follows  -  it  minimizes  the  posterior  loss  redefined 
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in  terms  of  the  local  deviation  from  the  true  parameter: 

Tn{z)=   /     p(z  -  z')nn(z')dz'. 
Jud 

Here  nn  (z)  is  the  posterior  density  for  the  local  deviation  z  from  the  true  parameter: 
nn  (z)  =  ln  (*)  M  (7n  (6)  +  Hnz)  /  f   £n  [z]  fj.  (7n  (6)  +  Hnz)  dz, 

JRd 

where  £n  (z)  is  the  local  likelihood  ratio  process  and  p.  is  the  prior  density.  As  n  — >  oo,  it  can 
be  conjectured  from  the  discussion  in  the  previous  section  that  the  posterior  irn  (z)  approaches 
7roo  {z)  =  A»  (z)  I  fRd  too  {z)  dz.  The  limit  local  posterior  density  iTao  is  a  function  of  the  likelihood 
only  and  does  not  depend  on  prior  information. 

Theorem  3.2  {Properties  of  BEs).  Suppose  that  the  conditions  of  Theorem  3.1  and  D1-D3 
hold.   Then 

1.  The  convergence  rate  is  n  for  estimating  (3  and  -Jn  for  estimating  a,  i.e.  Z„  —  Op  (1). 

2.  Zn  — >  Z,  where 

Z  =  arg  inf    /    p(z  -  z')      ^  {*'_\      dz' .  (3.10) 

2€RJiRi  JRd  too  (z)dz 

3.  If  p(z)  =  pp{u)  +  pa(v),  then  n(J3n  -  Pn  (&))-* d  Z13  =  argmfu  fR*0  pp  (u  -  u')  e2oo  [u')du'  and 
y/n(an  —  an  (5))  -)•<*  Za  =  arginfv  JR<!a  pa  (v  —  v')  £ioo  («')  dv' ,  and  Z®  and  Za  are  independent. 

Theorem  3.2  obtains  the  consistency  and  establishes  the  rates  of  convergence  and  the  limit  distri- 
butions of  the  BEs.  The  limit  is  given  in  the  form  of  a  Pitman  functional  of  a  limit  likelihood  and 
is  not  difficult  to  simulate  using  MCMC  methods  according  to  Remark  3.1.  The  result  also  justifies 
the  use  of  the  parametric  bootstrap,  cf.  Remark  3.5. 

In  the  stated  result,  Z&  and  Za  are  independent  due  to  the  factorization  of  too(z)  into  independent 
terms  £ioo(w)  and  ^200 («)•  If  p{z)  =  PpW)  +  pa(y)  does  not  hold,  part  3  of  Theorem  3.2  does  not 
apply.  Also,  the  limit  distribution  of  the  Bayes  estimator  of  the  shape  parameter  a  coincides  with 
that  of  the  MLE  if  the  loss  function  pa  is  symmetric  (by  Anderson's  lemma,  see  van  der  Vaart 
(1999)),  i.e.  the  limit  distribution  of  5  is  given  by 

Af(0,J-]). 

This  is  not  the  case  for  the  estimators  of  the  location  parameter  0.  Furthermore,  as  shown  below 
the  optimal  estimators  generally  are  not  transformations  of  the  MLE  asymptotically,  contrary  to 
the  non-regression  or  dummy  regression  cases. 

Remark  3.4  ( Wald  Inference  with  Sub  sampling).  Theorem  3.2  immediately  justifies  the  va- 
lidity of  subsampling  for  Wald  type  inference.    Subsampling  approximates  the  distribution  of  the 


estimator  in  the  full  sample  based  on  values  of  this  estimator  in  many  smaller  subsets  of  data. 
Implementation  protocols  are  standard  and  can  be  found  in  Politis,  Romano,  and  Wolf  (1999).  The- 
orem 2.2.1  in  Politis,  Romano,  and  Wolf  (1999)  applies  provided  (i)  the  estimates  are  consistent  at 
polynomial  in  n  rates,  (ii)  the  estimates  posses  a  limit  distribution.  Both  of  these  conditions  are 
proven  in  Theorem  3.2.  Thus,  Theorem  3.2  immediately  implies  the  validity  of  inference  based  on 
subsampling.  Subsampling  may  not  be  as  high  quality  as  the  parametric  bootstrap  or  simulation  of 
the  limit.  However,  subsampling  is  (a)  computationally  less  demanding  than  the  parametric  boot- 
strap and  (b)  is  likely  to  be  more  robust  than  other  methods  to  local  misspecification  of  parametric 
models  that  change  the  parameters  of  the  limit  distribution  but  do  not  affect  the  rates  of  convergence. 

Remark  3.5  (Wald  Inference  with  Parametric  Bootstrap).  As  in  Ibragimov  and  Has'minskii 
(1981a),  the  weak  convergence  results  and  the  proof  can  be  stated  uniformly  in  the  parameter  7, 
and  conditional  on  almost  every  realization  of  the  covariate  sequence  {Xj,i  <  n}  (n  — >  00).  In 
order  to  do  so,  the  notation  must  be  made  more  complicated  in  a  manner  similar  to  Ibragimov  and 
Has'minskii  (1981a)  to  denote  the  dependence  of  the  limit  on  the  parameter  7  and  on  the  realization 
of  the  covariate  sequence.  The  uniform  convergence  in  distribution  is  defined  as  the  convergence 
of  distributions  under  the  Levy  metric  uniformly  in  the  parameter  7,  and  conditional  on  covariate 
sequences.  This  immediately  implies  that  the  parametric  bootstrap  is  valid  in  the  usual  sense  that 
the  bootstrap  distribution  converges  to  the  limit  distribution  in  probability  under  the  Levy  metric 
as  long  as  the  preliminary  estimate  7— >p  7,  conditional  on  covariate  samples.  Any  initial  consistent 
estimator  7  may  be  used.  Hence  the  parametric  bootstrap  can  be  used  for  Wald  type  inference 
based  on  the  point  estimates  (See  for  example  Horowitz  (2000)).  Although  the  Bayes  estimates  are 
not  difficult  to  recompute  (especially  with  a  good  starting  value  such  as  the  initial  Bayes  estimate 
7),  the  parametric  bootstrap  appears  to  be  very  expensive  computationally.  As  discussed  in  Section 
4,  it  is  much  more  computationally  demanding  than  any  other  method.  The  parametric  bootstrap 
may  not  be  robust  against  the  local  misspecification  of  the  parametric  models. 

Next  consider  the  posterior  mean  7  and  posterior  quantile  7  (t)  as  the  solutions  to  the  problem 
(2.4)  under  the  squared  loss  and  the  check  loss  functions,  respectively  (each  defined  in  Section  2). 
Also  define  Z  and  Z  (r)  as  the  solutions  of  the  limit  problem  (3.10)  under  the  squared  and  the  check 
functions,  respectively. 

Theorem  3.3  (Mean-Unbiasedness,    Quantile- Unbiasedness,  Posterior  Confidence  In- 
tervals).  Under  the  conditions  of  Theorem  3.2 
1.  Posterior  mean  estimators  are  asymptotically  mean-unbiased: 
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2.  Consider  any  0  <  r'  <  r"  <  1.  If  Z  (t)  has  positive  density  in  the  neighborhood  of  0,  for  t  —  r' 
and  t  =  t" ,  then  posterior  r-quantiles  are  1  —  r-quantile  unbiased: 

VmoP-rmw{<rt{T))i  <  (7n  (*)),-}  =  Ac{(^  W)j  <  0}  =  1-T,  (3.11) 

where  (a)-  denotes  the  j-th  components  of  vector  a.  Hence 

taP^JRMlj  <  (%(*)),  <  (tV)),}  =r"-r'.  (3.12) 

A  very  useful  implication  of  the  quantile  unbiasedness  result  is  the  validity  of  confidence  intervals 
[(7  (r'))  •  1  (7  (r"))j]  f°r  large  sample  inference  on  parameter  components  (7) .. 

Results  1  and  2  follow  from  the  asymptotic  optimality  of  posterior  means  and  quantiles  respec- 
tively under  the  squared  and  check  function  losses,  which  are  defined  and  established  in  section  3.4. 
For  example,  if  the  limit  posterior  mean  Z  had  a  mean  EZ  =  c  /  0,  then  the  estimator  7  +  Hnc 
would  have  a  strictly  lower  asymptotic  risk  regardless  of  the  local  parameter  sequence.  Hence  it  must 
be  that  EZ  =  0.  A  similar  argument  applies  to  the  rth-posterior  quantile.  The  r-posterior  quantile 
is  1  —  r-quantile  unbiased  because  it  is  asymptotically  optimal  under  the  r-check  loss  function.  The 
requirement  that  Zj  (r)  has  positive  density  around  0  is  technical. 

The  next  result  concerns  the  asymptotic  validity  of  the  posterior  quantiles  c(t)  for  inference 
about  smooth  functions  of  the  parameters.  Consider  inference  about  the  function  rn  (7)  where 
r„  :  Id»+d"  ->  R  is  such  that  for  a  >  1  and  R  =  [R',R']'  with  rank  R=  1: 

rn  (7)  -  rn  (70)  =  R'(a-  00)  y/n  +  R'{0-0o)n  +  O  (n\0  -  0o\a  +  ^\a  -  a0\a)  .  (3.13) 

For  purposes  of  theoretical  analysis,  the  function  is  made  dependent  on  n  specifically  to  have  a  better 
finite-sample  approximation  through  the  avoidance  of  the  trivial  case  where  all  of  the  asymptotic 
inference  is  determined  by  either  parameter  a  or  0  due  to  the  difference  in  rates  of  convergence. 
If  a  smooth  function  m  (/?)  is  of  prime  interest,  taking  rn  (7)  =  n  ■  m  (0)  fulfills  condition  (3.13). 
If  a  smooth  function  m  (a)  is  of  interest,  then  taking  rn  (7)  =  ^/n  ■  m  (a)  also  fulfills  condition 
(3.13).  Note  that  these  transformations  by  %fn  or  n  do  not  affect  the  practical  formulations  (2.7)  - 
(2.9)  in  Section  2.3  by  the  linearity  of  transformations  and  equivariance  of  quantiles  to  monotone 
transformations. 

Theorem  3.4  (  Inference  with  Posterior  Quantiles  ).    Under  the  conditions  of  Theorem  3.2 

1.  For  any  0  <  r  <  1,  (c{t)  -  rn  (yn  (5)))  ->d  Z  (r),  where 

Z(r)Sarginf     |/(i  -  R>Z;r)  ^  £  ^^d, 

2.  Provided  Z  (r)  has  positive  density  over  a  neighborhood  of  0  for  r  =  rl  and  t  =  r" 

nlimPTn(4){c(T)  <  rn  (7b  (6))}  =  P7o{z(r)  <  o}  =  1  -  r,  (3.14) 
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and 

nlimoP7T>w{c(T,)<rn(7„((5))<c(r")}  =  t"  -  t' .  (3.15) 

Theorem  3.4  generalizes  Theorem  3.3  to  more  inference  about  more  general  functions  than  con- 
struction of  parameter  confidence  intervals.  For  example,  a  1  —  r-level  asymptotic  test  of  the  null 
hypothesis  rn  (70)  =  rn  is  given  by  the  decision  rule  that  rejects  the  null  if  r„  $  [c(t/2)  ,c(1  —  r/2)], 
where  (3.15)  can  be  used  to  deduce  the  local  power  and  consistency  of  the  test. 

3.4.  Optimality.  Lemma  3.1  below  briefly  records  the  finite-sample  and  asymptotic  average  risk 
optimality  properties  of  BE's,  which  is  needed  only  for  the  auxiliary  purposes  of  proving  Theorems 
3.3  and  3.4.  A  detailed  valuable  analysis  of  (minimax)  optimality  in  non-regular  models  can  be 
found  in  Hirano  and  Porter  (2002). 

Define  the  normalization  matrix  Hn  as  in  section  3.1,  and  let  7„((5)  =  70  +  Hn5,  5  €  Kd,  denote 
the  local  parameter  sequence.  Consider  the  set  Tn  of  all  statistics  (measurable  mappings  of  data) 
7„.  Define  the  expected  risk  associated  with  a  loss  function  p  and  estimator  jn  as  EP  p(Zn), 
where  Zn  =  i/"1  [7„  —  7n(^)]  and  the  expectation  is  computed  under  7n(<5).  Consider  the  following 
measures  of  risk. 

The  finite  sample  average  risk  (AR)  of  7  is  given  by: 

~y  J^EP^(S)p(Zn)p^n(5))d6,  (3.16) 

where  p.  is  the  weight  or  prior  measure  over  K,  p  is  the  loss  function  over  K,  and  A  is  the  Lebesgue 
measure.  The  asymptotic  average  risk  (AAR)  of  estimator  sequence  {7n}  is  given  by 

limsuplimsup  — tjtt  \   EP        p{Zn)d6,  (3.17) 

KtRd       n->°°     Al-ttJ  JK 

where  K  |  Kd  denotes  an  increasing  sequence  of  cubes  centered  at  the  origin  and  converging  to  Rd . 
Compared  to  the  previous  formula,  the  weight  p.  is  replaced  by  the  objective  (uninformative)  weight 
overM'*. 

Lemma  3.1.  Suppose  the  conditions  of  Theorem  3.2  hold.  For  %,,,,„  €  Tn  denoting  the  Bayes 
estimator  under  loss  p  and  prior  weight  p,  Zn  =  i/"1  [jP,,,,n  —  7n(<5)],  Un  =  n  (B  —  /?o) x  y/n  (-A  —  cxq) 

1.  For  each  n  >  1  the  infimum  of  finite  sample  average  risk  for  K  =  Un  is  achieved  over  T„ 
by  the  Bayes  estimator  %,„,„,  i.e.  at  Zn  —  Zn  in  (3.16). 

2.  The  infimum  of  asymptotic  average  risk  over  estimator  sequences  in  T„  equals  EP^op(Z)  < 
00  and  is  attained  by  the  sequence  of  the  Bayes  estimators  jPlll,„,  i.e.  {Zn}  =  {Zn}  in 
(3.17).  (Z  denotes  the  weak  limit  of  Zn). 

Statement  1  is  a  basic  result  of  statistics,  that  the  optimal  estimator  under  loss  p  is  a  Bayes 
estimator  defined  by  the  risk-weighting  function  p  and  a  loss  function  p,  cf.    Wald  (1950)  and 
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Lehmann  and  Casella  (1998),  Chapter  5.  Statement  1  is  often  simply  used  as  an  alternate  definition 
of  the  Bayes  (optimal)  procedures.  Statement  2  translates  finite-sample  efficiency  into  asymptotic 
average  risk  efficiency  (this  result  essentially  follows  from  Ibragimov  and  Has'minskii  (1981b)  p.93). 

It  is  critical  that  unlike  in  the  regular  case,  the  efficiency  rankings  are  largely  determined  by  the 
loss  function  p.  For  example,  MLE  may  be  worse  than  the  posterior  mean  under  the  squared  loss, 
but  performs  better  under  other  loss  functions,  cf.  Section  4. 

3.5.  Large  Sample  Theory  of  Maximum  Likelihood  Procedures.  We  provide  only  a  brief 
discussion  of  the  MLE.  Consider  the  MLE  Zn  =  (z% ,  Z°'\  =  (n{0  -  0n  (5))',  Jn~{a  -  an  (<5))'Y, 
which  is  centered  at  the  true  parameter  and  normalized  by  the  convergence  rates. 
Theorem  3.5  {Properties  of  MLE).   Under  C0-C5,  and  supposing  that  -£oo{z)  attains  a  unique 
minimum  in  Rd  a.s.,  then  Zn  —  Op(l)  and 

Zn^d  Z  =  arginf.6RJ  -  £00(2)- 

In  particular,  Z°^td  Za  =  J~1W  =  N{0,J-1),  Z0->d  Z0  =  argminu6R<f/,  -  l2oo{u),  and  Z0  and 
Za  are  independent. 

The  proof  is  given  in  the  technical  report  (Chernozhukov  and  Hong  (2003)).  The  limit  variable 
is  an  argmin  of  a  limit  likelihood,  which  inherits  the  discontinuities  of  the  finite  sample  likelihood. 
Due  to  asymptotic  independence  of  the  information  about  the  shape  parameter  from  the  information 
about  the  location  parameter,  the  MLEs  for  these  parameters  are  asymptotically  independent.  In 
the  boundary  models,  the  limit  result  can  be  stated  more  explicitly  for  P  as  follows: 

n(0  -  /?„  {8))->d  Z0  =  arginf  (  -  exp(u'm)      such  that  J;  >  A  (*,•)'  u, for  all  i  >  l\ , 
=  argsup  (       u'm  such  that  J;  >  A(Ai)'u,for  all  i  >  lj. 

This  result  generalizes  the  results  of  Donald  and  Paarsch  (1993a)  and  Smith  (1994).  Note  that  the 
solutions  of  the  linear  programs  like  these  are  unique  under  fairly  weak  conditions. 

Remark  3.6  (Asymptotic  Non- Sufficiency  of  MLE's).  It  is  important  to  note  here  that 
the  posterior  means  and  medians  are  generally  not  equal  to  the  bias  corrected  MLE.  Consider  the 
example  of  Section  3.2  where 

4o  (z)  =  eEWzl  [Tt  >  X(zM  all  i  >  1). 

The  limit  maximum  likelihood  variable  Z  maximizes  £00  (z),  which  is  equivalent  to  maximizing 
E  (X)  z  subject  to  the  constraint  T;  >  X\z,  for  all  i  >  1.    In  the  no  covariate  case,  the  limit 

For  example,  a  very  simple  sufficient  condition  for  almost  sure  uniqueness  is  that  there  is  one  continuously 
distributed  element  in  A  (A,)  and  that  Af-Y,)  has  a  nondegenerate  distribution,  cf.  Portnoy  (1991)  for  a  related 
problem.  When  A  (X,)  has  discrete  support,  the  stated  limit  result  coincides  with  the  result  of  Donald  and  Paarsch 
(1993a)  who  show  that  uniqueness  holds  if  A(/f,)  has  nondegenerate  distribution  (assumed  in  C3). 
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MLE  Z  maximizes  over  z  such  that  I*  >  z,  thus  Z  =  min{r;,i  <  n},  l<x>(z)  =  ezl(z  <  Z), 
implying  sufficiency  of  Z.  If  Z  is  sufficient  then  the  limit  optimal  Bayes  estimators  are  all  some 
shift  transformations  of  Z  by  the  well-known  Rao-Blackwell  argument.  This  raises  the  question  of 
whether  Z  is  a  sufficient  statistic  for  £«,  (z)  in  the  general  regression  case.  Taking  the  example  with 
X  =  (1,X)  where  X  is  continuous,  it  is  easy  to  see  that 

4o  (z)  ^  eEm'*l(X!z  <  X!Z,tor  all  i  >  1)  with  strictly  positive  probability, 

implying  that  Z  is  not  sufficient  for  £<*,  (z)  even  conditional  on  covariates.  Thus,  the  limit  likelihood- 
based  Bayes  estimators  Z  are  generally  not  nonrandom  functions  of  the  limit  MLE  Z. 


4.  Computational  Experiments 

4.1.  Monte  Carlo  Design.  We  used  a  simple  procurement  auction  model  similar  to  that  in  section 

2.1,  where  we  set  62  =  1  and  <?i  =  exp(/?0  + A-X),  with  X  ~  1/(0, 1),  /?0  =  1,  P\  =  1,  m  =  3.  We  take 
n  =  100  and  n  =  400,  which  are  close  to  practical  sample  sizes  encountered  in  empirical  work  on 
auctions.  We  used  the  parameter  space  B  —  [/3n  ±  5]  x  [j3i  ±  5]  and  a  flat  prior  to  compute  the  Bayes 
estimates.  The  starting  value  was  set  to  be  0  in  the  computation  of  the  estimates.  The  computations 
were  performed  using  the  canonical  random-walk  MCMC  algorithm  described  on  p.  245  in  Robert 
and  Casella  (1998).17 

4.2.  Quality  of  Estimation  Procedures.  We  compare  the  performance  of 


(1)  the  posterior  median, 

(2)  the  posterior  mean, 

(3)  the  posterior  mode  (MLE), 


across  different  risk  measures.    Here  the  MLE  is  computed  by  taking  the  argmax  over  the  grid 
generated  by  the  MCMC  sequence.18 

The  results  given  in  Table  1  and  Table  2  show  that:  (a)  the  posterior  median  is  the  best  under 
the  mean  absolute  deviation  loss,  (b)  the  posterior  mean  is  the  best  under  the  mean  squared  loss, 
(c)  the  MLEs  do  better  under  the  mean  10-th  power  loss  function.  Thus,  it  appears  that  all  of  the 
likelihood  procedures  perform  quite  well  relative  to  some  risk  measure. 


In  the  implementation  the  first  20,000  draws  are  made  for  a  "burn-in  stage",  with  adjustments  made  to  the 
variance  of  transition  kernel  every  200  draws  in  order  to  keep  the  rejection  rate  near  .5.  Then  additional  20,000  draws 
were  made  with  a  fixed  variance,  and  used  in  the  computation  of  estimates.  The  C++  implementation  is  available 
from  the  authors. 

18We  also  tried  the  approximate  MLE  defined  as  a  BE  under  (truncated)  10-th  power  loss  function  but  the 
performance  of  the  approximate  and  exact  MLEs  coincided  up  to  many  digits  and  is  not  reported  separately.  Other 
loss  functions  that  approximate  delta  function  can  also  be  used  to  approximate  MLEs  by  some  BE. 

20 


4.3.  Quality  of  Inferential  Procedures.  In  the  next  step,  we  compare  the  performance  of  several 
inference  methods,  focusing  on  the  coverage  properties  of  the  confidence  intervals.  We  compare 

(1)  the  confidence  intervals  based  on  the  posterior  quantiles, 

(2)  the  percentile  confidence  intervals  based  on  the  parametric  bootstrap  of  the  posterior  mean, 

(3)  the  percentile  confidence  intervals  based  in  the  parametric  bootstrap  of  the  MLE, 

(4)  the  percentile  confidence  intervals  based  on  the  subsampling  of  the  posterior  mean  (using 
1/4  x  n  as  the  subsample  size), 

(5)  the  simulation  of  the  limit  distribution  of  MLE  and  other  estimators  as  described  in  Remark 
3.1  (using  b  =  n). 

Results  reported  in  Table  3  indicate  that  the  intervals  based  on  the  posterior  quantiles  and  simulation 
of  the  limit  distribution  perform  nearly  as  well  as  the  parametric  bootstrap,  while  subsampling 
performs  worse  than  any  of  them.  The  confidence  intervals  based  on  the  posterior  quantiles  also 
appear  to  be  the  shortest  on  average.  Given  that  the  posterior  intervals  are  the  least  expensive  to 
compute,  they  should  be  preferred.  The  subsampling  is  less  expensive  than  the  parametric  bootstrap 
and  is  probably  more  robust  for  inference  purposes  under  local  misspecification. 

In  terms  of  computational  expense,  computation  of  the  posterior  quantiles  takes  less  than  1  minute 
on  a  Pentium  III  PC.  Simulation  of  the  limit  distribution  is  roughly  twice  as  expensive  (because 
the  limit  expressions  are  simple  transformations  of  linear  functions  and  do  not  contain  nonlinear 
expressions).  We  used  200  bootstrap  draws  and  the  full  sample  estimate  as  the  starting  value  in 
the  MCMC  algorithm  (which  reduces  the  number  of  MCMC  draws  needed  in  the  re-computation  of 
the  estimates).  Using  this  implementation,  200  bootstrap  draws  take  between  7  and  30  minutes  for 
samples  n  =  100  and  n  =  400.  (Thus,  1000  bootstrap  draws  take  up  to  150  minutes  for  n  =  400). 
The  subsampling  takes  about  1/5-th  of  the  time  of  the  parametric  bootstrap,  using  the  same  number 
of  draws.  The  entire  Monte  Carlo  work  took  several  weeks  of  computer  time. 

5.  Conclusion 

We  studied  estimation  and  inference  a  general  model  in  which  the  conditional  density  of  the 
dependent  variable  jumps  at  a  location  that  is  parameter  dependent.  This  model  includes  a  number 
of  interesting  economic  models  discussed  in  the  recent  literature  of  structural  estimation.  We  derived 
the  large  sample  theory  of  a  variety  of  likelihood  based  procedures,  and  offered  an  array  of  useful 
and  practical  inference  techniques,  including  Wald  type  and  Bayes  type  inference  methods.  The 
results  provide  a  theoretical  and  practical  solution  to  an  important  econometric  problem. 

Appendix  A.  Regularity  Conditions  C0-C5  and  D1-D3 

Notation.  Throughout  the  paper,  c  and  const  denote  generic  positive  constants  unless  stated  otherwise; 
— >v    and  — >d    denote  convergence  in  probability  and  distribution,  respectively;  ||x||  is  the  usual  Euclidian 
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norm  Vx'x,  and  |x|  is  used  to  denote  the  supremum  norm  ,  i.e.  |x|  =  supJ<fc  \xj\  where  x  —  (xi,...,Xk)- 
Note  the  densities  of  interest  /(e|x,7)  are  discontinuous  at  e  =  0  and  are  not  differentiable  in  e  at  e  =  0.  To 
simplify  notation,  we  use  9/(e|x,7)/3e  to  denote  the  usual  partial  derivative  when  e  ^  0,  and  also  use  it  to 
denote  the  directional  partial  derivative  df(Q+\x,f)/dt  when  e  =  0,  etc.  Also,  .6,5(7)  denotes  a  closed  ball 
at  7  with  radius  5  as  measured  by  |  •  | . 

Conditions  C0-C5  The  following  conditions  apply  to  x  in  X,  e  6  R.  Conditions  CO  -C3  apply  to  any 
7  =  {P,a)  in  Q.  Conditions  C4  and  C5  apply  to  any  7  =  (/3,a)  and  7  =  (/3,  a)  in  .8,5(7)  f°r  some  5  >  0. 

CO  For  each  7,  (Y,,  Xi)  is  an  iid  sequence  of  vectors  in  R  x  R*,  denned  on  probability  space  (Q,7",  P7). 

Q  C  Rd  is  compact  convex  set  such  that  70  £  interior  Q\  for  any  7  and  7^7,  P7{/(yi  —  <?(Xi,/?)|.X,,7)  ^ 
/(Y,  -S(X,/3)|X,,7)}>0. 

CI  Xi  has  cdf  Fx,  that  does  not  depend  on  7,  and  has  compact  support  X.    In  addition  to  (2.2), 

uniformly  in  7  and  x,  we  have  either 

(i)    the  two-sided  model:  p(x,y)  >  q(x,~f)  >  c  >  0,  or 

(ii)  the  one-sided  model:  p(x,7)  >  c  >  0  and  f  (t\x,~y)  =  q(x,~/)  =  0,  for  all  e  <  0. 

C2  Without  loss  of  generality,  the  density  /(e|x,7)  is  upper-semicontinuous  at  e  =  0  for  each  x  and 

7.  The  density  f(e\x,f)  is  bounded  from  above  uniformly  in  (e,x,i).  f(e\x,*y)  has  continuous  first  partial 
derivative  in  e  (except  at  e  =  0)  that  is  bounded  uniformly  in  (e,  £,7);  f(e\x,*y)  has  continuous  first  and 
second  partial  derivative  in  7  that  is  bounded  uniformly  in  (e,  x,  7).  The  density  and  the  derivatives  specified 
above  are  continuous  in  x  on  X  for  each  «  and  7.  Lastly,  sup7  Ex  J  \$-f(y  —  g(X,/3)\X;~/)\dy  <  00. 

C3  The  function  g(x,fi)  has  two  continuous  and  bounded  derivatives  in  /?,  uniformly  in  x  and  /?,  and 

■ffi     da  aa       1  's  positive  definite  uniformly  in  /?.  The  function  and  the  specified  above  derivatives  are 

continuous  in  x  on  X  for  each  /3. 

C4  When  the  nuisance  parameter  a  is  present,  for  U  (7)  =  In/  (Y,  —  g(Xi,p)  \Xi,^)  where  7  =  (J3,a), 

uniformly  in  7  and  7  either  (a)  EPl  [-§^U  (7)  JWj  (7)']  is  positive  definite  and  bounded  or  (b)  if  -^U  (7)  =  0 
P7-a.s.,  then  £^-,[^'1  (7)  ^U  (7)']  is  positive  definite  and  bounded. 

C5  In  the  two-sided  model  Cl.i,  the  terms 

\%lnf(Yi  -  g{XiJ)\Xin)\,  \\^lnf  (y*  -*(X*,0)|**,7)  f,  W^rlnf  (Yt  -  g(Xi,p)\Xi^)\\      (A.l) 

are  bounded  respectively  by  Cj(f,,Xj),  j  =  1,  2,3,  for  all  Y,  —  g(Xi,/3)  6  R  \  {0},  uniformly  in  76  Bs(~)o), 
where  sup7  EPl  Cj (e, ;,  Xi )  <  00  for  j  =  1,2,3.  Similarly,  for  the  one-sided  model  Cl.ii  the  terms  in  (A.l) 
are  bounded  respectively  by  Cj(e,,X,),  j  =  1,  2,3,  for  all  Yi  —  p(X,,/3)  >  0,  uniformly  in  7  e  #,5(70),  where 
sup7  EPy  Cj  (e, ■„  Xi )  <  00  for  j  =  1 ,  2, 3. 

Lemma  A.l  (Important  Constants).  The  conditions  C0-C3  imply  that  there  are  finite  constants  f,  f, 
f",  f_,  9,  9',  9"  such  that 

sup         /(e|*,7)</,  sup         ||^*w/(«|«,7)ll</',  sup         ||Jfc,/(«|x,7)||  <  /", 

sup     \g(x,/3)\<g,        sup     ||^,s(x,/?)||  <  3,       sup     Hj^cKx,/?)!!  <  s", 

J68.I6X  S6B,i6X  SgB.igX 
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and  also  that  for  some  6  >  0  and  V(0)  =  [—5,(5]  \  {0}  in  case  of  Cl(i)  and  V(0)  =  (0,<5]  in  case  of  Cl(ii) 

inf  /(e|x,7)  >  /  >  0. 

*ev(0),xex,7?g  - 

Remark  A.l.  The  regularity  conditions  are  summarized  in  Section  2.2.  The  parameters  7  include  the 
location  parameters  /?  and  the  shape  parameters  a.  If  j3  is  known,  the  inference  about  a  is  regular.  Thus, 
the  conditions  C0-C5  are  a  mixture  of  non-regular  and  regular  assumptions,  as  in  Ibragimov  and  Has'minskii 
(1981a)  and  van  der  Vaart  (1999),  Ch.  7.  Condition  Cl(ii)  allows  for  the  boundary  model,  where  the  density 
is  zero  to  the  left  side  of  the  jump  and  is  positive  on  the  right  side.  Condition  Cl(i)  allows  for  the  two-sided 
model  where  the  density  is  positive  on  both  sides.  Conditions  C3-C5  are  common  in  nonlinear  analysis. 

Conditions  D1-D3  The  prior  p  :  Q  ->  R+  and  the  loss  function  p  :  Rd"+ds  -*  R+  have  the  following 
properties: 

Dl  p  (■)  >  0  is  continuous  on  Q, 

D2  p  ( •)  >  0  and  p(z)  =  0  iff  z  =  0,  p  is  convex, 

D3  p(z)  is  dominated  by  a  polynomial  of  \z\  as  \z\  — >  00. 

Remark  A. 2.  These  are  standard  assumptions  on  the  loss  function  p  and  the  prior  ft,  see  for  example 
Ibragimov  and  Has'minskii  (1981b).  Since  BE's  become  essentially  uncomputable  when  p  is  not  convex,  we 
do  not  consider  the  non-convex  loss  functions  for  pragmatic  reasons.  However,  the  proof  of  Theorems  3.1- 
3.4  do  not  rely  upon  the  convexity  assumption  and  the  results  apply  more  generally  to  other  loss  functions 
specified  in  Ibragimov  and  Has'minskii  (1981b). 


Appendix  B.  Proofs  for  Section  3 
[N.B.  In  the  proofs  we  extensively  use  the  constants  defined  in  Lemma  A.l] 

B.l.  Proof  of  Theorem  3.1.  In  the  proof  we  set  the  local  parameter  sequence  fn  =  70-    Considering  a 
general  sequence  does  not  change  the  argument  but  complicates  notation. 

Following  Ibragimov  and  Has'minskii  (1981a),  we  split  the  log  likelihood  ratio  process 

Q„  (2)  =  \nen(z)  =  In  L„(f0  +  H„z)/L„(7o) 

into  the  continuous  part  Qcn  (z)  and  the  piece-wise  constant  part  Q„  (2),  and  analyze  each  part  separately. 
Our  goal  is  to  show  that  Qn  (z)  converges  in  distribution  in  the  finite-dimensional  sense  to 

Qoo(z)  =  QUz)  +  Qi>(z), 


where 


Qlc(z)  =  W'«  -  \v' Jv  +  m'u,      Qt(z)=  [       lu(j,x)dK(j,x), 

1  JRxX 


/RxX 

where  each  term  is  defined  in  Theorem  3.1.  Given  this  result,  the  finite-dimensional  limit  of  tn(z)  is 

^00(2)  =  axp (Qao(z))  ■ 
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For  z  =  (u  ,v')' ,  using  that  e,-  =  V*  —  g(Xi,j3o), 


Qn(z)  =  ^2ri„(z)    x[l(e<  >  {A„(X„«)/n}  V0)  +  l(e,  <  {An(X,,u)/n}  A  0)] 
1=1 

n 

+  ^(r,„(z)  -  7Nn(z))  x  [1(0  <  u  <  An(X„u)/n)  +  1(0  >  e,  >  A„(X,-,u)/n)] 
i=i 

+  ^Ttn„(z)    x[l(0<e,  <  An(X,,u)/n)  +  l(0>e,  >  A„(X;,u)/n)] 


<?*<*) 


=  Ql  (z)  +  Ql  («) ,  where 


f,„(z)  =  In 
=  ln 


f(Yi  -  g(X,,/30  +  u/n)  \X,,/3o  +  u/n,a0  +  v/-fn) 


HYi-giXuM&i^o) 
f(tj  -  An(Xi,u)/n)\X,,l3o  +  u/n,a0  +  v/y/n) 


rin(z)  =  In 


q(Xi) 


lP(X,) 


/(e,|X,-,7o) 
1(0  <e,)+ln 


p(Xi) 


q(Xi) 


1(0  >  e,), 


An(x,u)  =  n(g(Xi,/3o  +  u/n)  -g(Xi,0o))- 


The  convergence  analysis  of  the  continuous  part  Qcn  (z)  is  standard.  In  sharp  contrast,  behavior  of  discon- 
tinuous part  Ql  (z)  differs  from  that  of  Qcn  (z),  and  is  analyzed  using  the  point  process  methods. 

Also,  in  above  expressions  and  all  proofs  we  use  the  algebraic  rules  of  Ibragimov  and  Has'minskii  (1981a) 
for  working  with  oo's  defined  in  Theorem  3.1.  This  is  done  to  include  the  proof  for  the  boundary  model  as 
a  special  case.  In  particular,  the  expressions  involving  1(0  >  e,  >  ...)  cancel,  since  in  the  boundary  models 
e,  >  0.  Also  in  the  boundary  model  r;„(z)  =  r,„(.z)  =  — oo  when  0  <  e;  <  An(X",, u)/n,  so  that 

QL(2)  =  o. 

Thus,  the  term  Qlniz)  is  only  non-zero  for  the  two-sided  model.  Further  details  follow. 

Part  I  obtains  the  finite-dimensional  limit  of  Qcn{z).  The  proof  method  is  standard  for  the  smooth  likelihood 
analysis. 

Application  of  Taylor  expansion  to  each  r,n(z),  i  =  1,  ...,n,  so  that  the  expanded  terms  are  iid,  followed 
by  application  of  the  Markov  LLN  and  Chebyshev  inequality,  yields  for  a  given  z  [see  Addendum] 


Q^z)  =  -u'EA(X%)f'{£'^'lo)+v'    -L£Aln/(e,|X„7o) 

■s. •  L  * — * 


* 


£a^'°E/(<|A'^l 


v  +  op(l), 
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=  -J 


where  A(A,)  =  dg{Xi,/30)  /d/3.    The  information  matrix  equality  for  a  implies  -J  =  Ed"  '"l^f  '7o)  = 
_gai°g/Hx,T0)aioS/^|x,-rQ)'|  ^  the  CLT  giyes  W7i_>rf  W  =7V(0,J-).  Also  it  follows  by  C219 

m  =  EA(Xi)(p(Xi)-q(Xi)). 

Therefore,  the  finite-dimensional  limit  of  Q\n(z)  is  given  by 

Q^(z)  =  urn  +  Wv  -  \v'Jv. 

It  remains  to  show  Q\n  (2)  =  op(l).   In  the  one-sided  case  Q2T.  (2)  =  0,  hence  consider  the  two-sided  case. 
Note  that  by  assumptions  C1-C3,  for  any  compact  set  Z,  as  n  ->  00, 


In 


/(e  -  An(x,u)/n\x,/3o  +  u/n,ap  +  v/\/n) 
/(e|z,7o) 


-  In 


g(x) 


p(x) 


<2x(/7/)xS'||z||/v'n  (B.l) 


uniformly  in  {e,  z,x  €  R+  x  Z  x  X  :  A„(x,  u)  >  0, 0  <  e  <  A„(x,u)/n}.  Likewise 


hi, 


/(e  -  An(x,u)/n\x,/3o  +  u/n,cto  +  v/y/n) 


/(e|z,7o) 


p(x) 


_g(x) 


<2x  (/'//)  xg'\\z\\/JZ  (B.2) 


uniformly  in  {e, z,x  6  R_  x  Z  x  X  :  A„(i,a)  <  0,0  >  e  >  A„(i,u)/n}.  Thus 

71 

sup  |QC2„  (2)  |  <  2  x  (/'//)  xg'  x  \\Z\\/y/H  x  V  l(|e,  I  <  K/n)  =  Op{\[y/n~) 

-•EZ  ~t 


(B.3) 


for  some  constant  K=  ||Z||  x  g' ,  where  ||Z||  =  sup{||z||  :  2  6  Z},  where  /C  is  finite  by  C3.   The  Op(l/y/n) 
conclusion  is  by  C2: 


££^1(M  <^/n)  <2/A"  <co. 


(B.4) 


Part  II  obtains  the  finite-dimensional  limit  of  Q„  (2).  Recall 

^^-l(0<ne,<An(X„u))  +  ., 


oi(z)  =  J2  [|n  t53  1(0  < ne'  -  A"(^"u))  + ln  4§1(0  > ne-  £  An(^..«)]. 


By  C2  and  C3 

n 

£^|l(0  <  ne,  <  A„  (A\,u))  -  1(0  <  ne,  <  A(Xi)'u)\ 
1  =  1 

+  |l(0  >  ne,  >  A„  (X,,u))  -  1(0  >  ne,  >  A(A,)'u)|  <  2fg"\\u\\2/n  =  o(l), 
where  A(X,)  =  8,^V  '^  which  implies  that  for  given  2 


<5n(^)  =  £  f In  ^TTT1  (°  <  ne'  <  A(*0'")  +  In  ^7^1  (0  >  ne,  >  A(Xf)'u) 


P(X, 


9  (A",) 


Op(l). 


Now  note  that  (Q„  (zj)  ,j<l)  and  (Q5i  (z7-) ,  j  <  I),  for  any  finite  /,  are  asymptotically  independent.  This 
follows  by  applying  a  standard  argument  concerning  the  independence  of  minimal  order  statistics  and  sample 
averages  of  general  form,  see  for  example  Resnick  (1986)  or  Lemma  21.19  in  van  der  Vaart  (1999)  [  Addendum 
provides  the  proof  for  completeness]. 


19RecalI  that  for  a  density  function  /,  which  is  everywhere  continuously  differentiate  except  at  0  with  an  integrable 
derivative,  fRf'{u)du=  -/(0+)  +  /(0"). 

25 


The  next  step  is  to  obtain  the  finite-dimensional  limit  of  Qdn.  The  behavior  of  Q%.  is  determined  by  the 
near-to-jump  observations,  whose  behavior  is  described  using  a  point  process.  We  split  the  argument  in  two 
steps.  Step  1  constructs  the  required  point  process  and  derives  its  limit.  Step  2  applies  Step  1  to  obtain  the 
finite-dimensional  limit  of  Q^. 

Step  1:  Intuition  for  Step  1  is  provided  in  Section  3.2  of  the  main  text. 

Define  E  =    R  x  X.   The  topology  on  E  is  standard,  e.g.    [a,b]  x  X  is  a  compact  subset  relative  to  E. 
The  point  process  of  interest  is  a  random  measure  taking  the  following  form:  for  any  Borel  subset  A 

n 

N(A)  =  ^l[(nfi,I,)ei], 
i=i 

N  is  a  random  element  of  MP(E),  the  metric  space  of  nonnegative  point  measures  on  E,  with  the  metric 
generated  by  the  usual  topology  of  vague  convergence,  Resnick  (1987)  Ch.  3.  [Technical  Addendum  provides 
a  review].  We  show  that 

N  =>  N  in  MP(E), 

for  N  given  in  Theorem  3.1.  This  is  done  in  steps  (a)  and  (b). 

(a):  By  CI  and  C2,  for  any  F  €  7",  the  basis  of  relatively  compact  open  sets  in  E  (finite  unions  and 
intersections  of  open  bounded  rectangles  in  E),  limn-n*,  E'N(F)  =  lim,,-^  nP((rae,,X;)  e  F) 

=   I    \p(x)l(u  >  0)du  +  q(x)l(u  <  0)du]  x  dFx(x)  =  m(F)  <  oo,  (B.5) 

where  measure  m  is  defined  as  m(du,dx)  =  \p(x)l(u  >  0)du  +  q(x)l(u  <  0)du]  x  dFx(x).  Since  {(ne,, -X*)  g 
F}  are  independent  across  i  by  CO,  by  Meyer's  Theorem,  Meyer  (1973), 

lim  P(N(F)  =  0)  =  e-m(F).  (B.6) 

n— MX) 

Statements  (B.5)  and  (B.6)  imply  by  Kallenberg's  Theorem  -  Resnick  (1987),  Proposition  3.22  -  that 
N  =>■  N  in  MP(E),  where  TV  is  a  Poisson  point  process  with  the  mean  intensity  measure  m(-). 

(b):  Next  we  show  that  N  has  the  same  distribution  as  N  given  in  Theorem  3.1.  First,  consider  the 
canonical  Poisson  processes  No  and  N0  with  points  {I\}  and  {Tj}  defined  in  Theorem  3.1.  No  has  the 
mean  measure  mo(du)  =  du  on  (0,  oo),  and  N0  has  the  mean  measure  m'0{du)  =  du  on  (— oo,0),  see  Resnick 
(1987),  p. 138.  Because  JVo  and  N'0  are  independent,  Ni(-)  =  JV0(-)  +  iV0(-)'  is  a  Poisson  point  process 
with  mean  measure  mi(du)  =  du  on  R,  by  definition  of  the  Poisson  process,  see  Resnick  (1987),  p.  130. 
Because  {Xi,  X-}  are  i.i.d.  and  independent  of  {r^Tj},  by  Proposition  3.8  in  Resnick  (1987),  the  composed 
process  N2  with  points  ({Pi,  Xi},  {P;,  X[},i  >  1)  is  a  Poisson  process  with  the  mean  measure  m,2(du,dx)  = 
[l(u  >  0)du  +  l(u  <  0)du]  x  Fx(dx)  on  R  x  X.  Finally,  N  with  the  points  {T(r,, X),T{T'U  A7/)},  where 
T  :  (u,x)  i->  (l(u  >  0)u/p(x)  +  l(u  <  0)u/q(x),x)  is  a  Poisson  process  with  the  desired  mean  measure 
m(du,dx)  =  TO2  oT~1(du,dx)  =  \p(x)l(u  >  0)  +  q(x)l(u  <  0)]du  x  Fx(dx),  by  Proposition  3.7  in  Resnick 
(1987). 

Step  2:  We  have  for  2  =  (u,v) 

Qdn(z)  =  QdJu)  =  [f>^l  [0  <  nu  <  A(X,)'«]+f>jgjl  [0  >ne,  > A(X,)'«]]  +  op(l). 


Ignoring  the  op(l)  term,  write  Qi{u)  as  a  Lebesgue  integral  with  respect  to  N: 

Qdn(n)=   f  lu(j,x)dN{j,x), 

J  E 

where  lu(j,x)  is  defined  in  Theorem  3.1.  The  convergence  of  this  integral  is  implied  by  N  =>  N  in  both  the 
two-sided  and  one-sided  model: 

(a)  In  two-sided  model:  By  conditions  Cl-3,  the  function  (j,x)  M-  lu(j,x)  is  bounded  and  vanishes  outside 
the  compact  set  Ku  =  [-?7,+??]  x  X,  77  =  supx€X  |A(i)'u|,  where  77  <  00  by  C3.  Thus  (j,x)  >->  lu(j,x) 
has  compact  support  but  is  discontinuous  when  j  =  0  and  j  =  A(x)'u.  Define  the  map  T  :  MP(E)  i->  R'  as 
TV  1-4  (f  lUk(j,x)dN(j,x),k  <  I)  for  I  <  00.  Hence  by  Proposition  3.13  in  Resnick  (1987)  T  is  discontinuous 
at  V(T)  =  {N  e  MP(E)  :  j?  =  0  or  jf  =  i4A(zf )  for  some  i  >  l,fc  <  /}  where  (j? ,x? ,i  >  1)  denote  the 
points  of  Ar.  Since  e,'s  are  absolutely  continuous  P[N  e  V(T),  for  some  n  >  1]  =  0,  and  by  definition  of  N, 
P[N  e  1>(T)]  =  0.  Therefore  N  =>  N  in  MP(E)  implies  r(N)->d  T(N)  by  the  continuous  mapping  theorem, 
Resnick  (1987),  p.  153.  It  follows  {Qi(uk),k  <  l)-*d  (Qio(uk),k  <  I),  where 


Qt(u)  =  [  l.(j,x)dN(j, 

J E 


(b)  In  one-sided  model:  Using  the  Ibragimov  and  Has'minskii  (1981a)  rules  for  algebraic  operations  with 
oo's  stated  in  Theorem  3.1,  note  Q„(u)  =  Qto(u)  =  /£  lu(j, x)dN(j, x)  as  a  binomial  random  variable: 
Qi(u)  =  -00  if  N(-4(u))  >  0,  Qi(u)  =  0  if  N(A(u))  =  0,  where  A(u)  =  {(j,x)  e  R+  x  X  :  j  <  A(x)'u}. 
Also  define  Qt(u)  =  QtM  =  JElu(j,x)dN(j,x)  =  -00  if  N(A(u))  >  0,  Qt(u)  =  0  if  N(yl(u))  =  0. 
Thus,  to  show  finite-dimensional  convergence  (for  7*  =  —00  or  0): 

lim  P(Qdn(uk)  =7fclfc  <  J)  =  P(QUuk)  =~fk,k  <  I), 

n—¥oo 

it  suffices  to  show  (N(A(ut)),fc  <  l)—*d  (N(A(uk)),k  <  I)  for  /  <  00.  By  a  definition  of  weak  convergence 
of  point  processes,  cf.  Embrechts,  Kliippelberg,  and  Mikosch  (1997)  p.  232,  this  is  immediate  from  N  =>  N, 
since  by  C2  and  construction  of  N,  N(9.4(7it))  =  0  and  N(d.4(ujt))  =  0  a.s.  ■ 

B.2.  Proof  of  Theorem  3.2.  The  proof  applies  Theorem  1.10.2,  p.  107  of  Ibragimov  and  Has'minskii 
(1981b)  that  states  the  limit  distribution  of  BE's  provided  some  general  conditions  hold. 

First,  BE's  are  measurable  by  Jennrich's  (1969)  measurability  theorem  since  they  minimize  the  objective 
functions  that  are  continuous  in  data  and  parameters. 

Second,  the  following  conditions  (l)-(3)  verify  the  conditions  of  Theorem  1.10.2,  p.  107  of  Ibragimov  and 
Has'minskii  (1981b): 

(1)   (a)  Holder  continuity  of  („    (z)  in  expectation  proved  in  Lemma  C.2, 

(b)  Exponential  bound  on  the  expected  likelihood  tails  proved  in  Lemma  C.2:  for  a  >  0 

EPJn(z)U2  <e-°'^-l\ 

where  function  a'(\z\  —  1)  falls  to  the  class  of  functions  G  defined  on  p.  41  in  Ibragimov  and 
Has'minskii  (1981b),  i.e.  (i)  a'(\z\  —  1)  is  monotonically  increasing  to  00  in  \z\  on  [0, 00),  and  (ii) 
for  any  N  >  0,  lim^^^  \z\N e-a'<l*l-»  =  0. 
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(2)  Finite-dimensional  convergence  of  £n(z)   =  exp(Qn(z))  to  £oo(-z)   =  exp(Qoo(z)):    for  any  finite 
collection  (zj,j  <  I) 

(in(Zj),j<l)~^d    {too(Zj),j<l), 

(all  of  the  terms  are  defined  in  the  proof  of  Theorem  3.1.) 

(3)  The  limit  Bayes  problem: 

Z  =  arginf    /     p{z  -  z)- — °°  Z        dz, 

is  uniquely  solved  by  a  random  vector  Z,  which  is  by  D2   (since  p  is  convex  with  a  unique  minimum, 
cf.  p. 107  in  Ibragimov  and  Has'minskii  (1981b).) 

and  conditions  Dl  -D3  on  the  loss  functions  p  and  prior  p. 

It  must  be  noted  that  Ibragimov  and  Has'minskii  (1981b)  impose  the  symmetry  of  p  throughout  their  book. 
However,  the  inspection  of  the  proof  of  Theorems  1.10.2  (and  Theorem  1.5.2)  reveals  that  the  proof  does  not 
require  the  symmetry  and  applies  to  the  loss  functions  that  satisfy  D1-D3. 

Thus,  conditions  (l)-(3)  imply  by  Theorem  1.10.2  of  Ibragimov  and  Has'minskii  (1981b)  our  result: 

Zn— ><j  Z. 

Furthermore,  conditions  (l)-(3)  imply  by  Theorem  1.5.2  and  Theorem  1.10.2  of  Ibragimov  and  Has'minskii 
(1981b)  that  for  any  local  sequence  7n(5)  =  70  +  H„S,S  6  W.d  and  any  N  >  0 

lim       H-NPlnls){\Zn\  >  H}  =  0,       and    lim  EP    ,„p{Zn)  =  EP     p(Z)  <  00.  m.7) 

The  last  result  is  not  needed  to  prove  Theorem  3.2,  but  will  be  used  later.  ■ 

B.3.  Proof  of  Theorem  3.3.  To  show  claim  1,  note  under  the  conditions  of  Theorem  3.2  by  (B.7) 

lim  E7n(5)  [H-1  (7-7„  (*))]  =  EPloZ. 

Consider  the  problem  minc  EPl  p  {Z  +  c) ,  where  p(z)  =  z'z.  The  solution  of  this  minimization  problem  is 
to  set  c  =  —EZ.  Suppose  that  c^O,  then 

EPlop(Z  +  c)<EPlop(Z),  (B.8) 

where  by  Lemma  3.1  the  lhs  of  (B.8)  is  the  asymptotic  average  risk  of  the  sequence  of  estimators  7  +  Hnc 
and  the  rhs  of  (B.8)  is  the  asymptotic  average  risk  of  the  sequence  of  posterior  means  7,  which  contradicts 
the  asymptotic  average  risk  efficiency  of  the  posterior  mean  established  in  Lemma  3.1. 

To  show  claim  2,  note  that  by  Theorem  3.2  and  definition  of  weak  convergence 

YvmiP^w{^(T))j  <  (70),.}  =   Ym^P^^Zn  (r)).  <  o}  =  P-f0{{Z{r))j  <  o}. 

since  0  is  assumed  to  be  a  continuity  point  of  the  distribution  of  (Z  (t))  ..  Consider  the  problem 

mcin  Ep-,0P  ((Z  (T))j  _  c' T)  ' 

where  p(z;r)  =  (1  (z  >  0)  —  t)z.  Note  that  the  quantity  EPlop  UZ  (t))  .  —  c;  t)  is  finite  for  any  c  by  (B.7) 
or  by  Lemma  3.1.  A  solution  of  this  problem  is  given  by  the  root  of  the  first  order  condition 

Py0{{Z(r))j  >  c}  =  r  or  P,0{(Z  (r))J  <  c]  =  1  -  r,  (B.9) 

28 


i.e.  c  =  (1  —  r)-th  quantile  of  (Z(r))-  (under  the  condition  that  (Z  (r))  has  positive  density  in  any  small 
neighborhood  of  0).  Suppose  c  7^  0,  then 

EP7op((Z(t))j-c;t)<EPioP((Z(t))j;t),  (B.10) 

where  by  Lemma  3.1  the  lhs  of  (B.10)  is  the  asymptotic  average  risk  of  the  sequence  of  estimators  defined  as 
(7(7-)  —  Hnc)  ,  and  the  rhs  is  the  asymptotic  average  risk  of  the  posterior  r-th  quantile  (7  (r))  .  See  section 
3.4  and  Lemma  3.1  for  definitions.  Then  (B.10)  contradicts  the  asymptotic  average  risk  efficiency  of  the 
posterior  quantiles  under  the  check  function  loss  established  in  Lemma  3.1. 

Thus  it  must  be  that  c  =  0  in  (B.9),  so  that  the  first  part  of  claim  2,  equation  (3.11),  is  proven.   The 
second  part  of  claim  2,  equation  (3.12)  is  immediate  from  (B.9)  with  c  =  0  for  r  =  r'  and  r  =  r": 

limP7„(<){(7(r'))    <  (7n  (*)),-  <  (7(1-")),}  =   Hm  P^mUZn  (r%  <  0  <  {Zn  (r")),| 

=  1  -   lim  P7„(S){(Z„  (r"))    <  0}  -   lim  P,.„,{(ZB  (r'))     >  o\  (B.ll) 

Tl— HX>  K.  J  J  71— fOO  L  J  J 

=  l-P,o{(Z(r"))j<0}-P7o{(Z(r'))j>0}=r"-r'.     ■ 


B.4.  Proof  of  Theorem  3.4.   Consider  at  first  two  useful  lemmas. 

Lemma  B.l  (Integral  Convergence  of  t7l  (z)).  Suppose  that  (1)  £„  (z)  has  the  properties  specified  in 
Lemma  C.2  and  (2)  £n  (z)  converges  marginally  to  £x  (z)  under  a  parameter  sequence  7n(<5)  =  70  +  Hn5. 
Then  (a)  l^  (z)  >  0  in  some  ball  at  zero  a.s.,  (b)  for  any  vector-valued  continuous  function  g(z)  dominated 
by  a  polynomial  as  z  — *  00 

f        .  ,  ln(z)fi  (70  +  Hnz)  ,  ^      f      ,  .         ex(z)         , 

U 9 (z)  sKjAz')»{io  +  HnZ')dz>dz^  JK 9 {z)  j~e^WWdz- 

Here  Kn  is  either  the  set  {z  :  yn(S)  +  Hnz  6  Q} ,  in  which  case  K  =  R  ;  or  Kn  =  K  is  a  fixed  cube  centered 
at  the  origin.  [Convergence  in  distribution  here  is  taken  under  any  local  sequence  ~yn{8).] 

Lemma  B.2  (Convexity  Lemma).  Suppose  {Rt}  is  a  sequence  of  R-valued  random  functions,  de- 
fined on  R  .  If  Rt  converges  in  distribution  in  finite- dimensional  sense  to  Poo,  i.e.  for  any  I  <  00 
(Rt  (zj)  ,j  <  l)—*d  (R<x>{zj)  ,j  <l),  where  R^  is  convex  and  finite  on  an  open  non-empty  set  a.s.,  then 
arginfzeR'!-R7"(z)->d  arginf2gR,iP«,(z). 

Proof  of  Lemma  B.l  Assertion  (a)  is  a  special  case  of  Lemma  1.5.1  in  Ibragimov  and  Has'minskii  (1981b). 
Assertion  (b)  is  proven  on  p.  106-109  of  Ibragimov  and  Has'minskii  (1981b)  under  more  general  conditions 
than  conditions  (1)  and  (2).  ■ 

Proof  of  Lemma  B.2  See  Davis,  Knight,  and  Liu  (1992)  and  Pollard  (1991).  ■ 

The  first  part  of  the  proof  of  Theorem  3.4  is  done  by  setting  the  true  parameter  sequence  7„  (5)  =  70. 
Considering  general  sequence  does  not  change  the  argument  but  significantly  complicates  notation. 

Write  Zn  (r)  =  (c(t)  -  r„  (70)).  Note  that 

Zn  (r)  =  arg  inf  T„  (z) ,  I\,  (i)  =  /    p (z  -  r„  (70  +  Hnz)  +  rn  (70)  ; r)         /"(*)/X  (.7°  +  %"**     .  ,dz. 
*eR  JRd  JRden(z')p,(-ro  +  Hnz')dz' 
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Since  \rn  (70  +  Hnz)  —  r„  (70)  —  R"z\  =  O  Iz  ■  \Hnz\a  J  for  a'  >  0,  it  is  the  case  by  the  properties  of  the 
check  function  that 

\p (z  -  r„  (70  +  Hnz)  +  r„  (70)  ;t)-P(z-  R' z;  r)  |  <  2\0(z  ■  \Hnzf)\, 
for  any  2.  Hence  by  Lemma  B.l 

in  (z)  p.  (70  +  Hnz) 


(B.12) 


Tn  (z)  =f    (p(S-  Hz;  r)+0(z-  \Hnz\"))        /77wt^L^ 
Jk*  jRi  in  (z')ft  (70  +  Hnz')  dz' 

f          I-         D'          \               In  (z)  ft  (jO  +  H„Z) 
=    /       p(z-Rz\T)-f „       .      . ; — .     ,       dz  +  Op(l)  . 

./if*  fRdln(z')ft(yo+Hnz')dz'  pv; 

Applying  Lemma  B.l  again,  it  follow  that  the  marginal  limit  of  r„  (5)  is  given  by 

Recall  that  Z  (r)  denotes  the  minimizer  of  T^,  (z).  By  Lemma  B.2 

Zn  (t)  ->d  Z  (r)  . 

Thus,  in  what  follows  it  suffices  to  consider  only  linear  functions  such  that  r„  (70  +  Hnz)  —  r„  (70)  —  R'z  =  0 
for  all  z. 

Next  we  need  to  establish  the  uniform  integrability  for  Zn  (r).  Consider  linear  transformation  f  =  M'z 
defined  by  the  nonsingular  matrix  M  such  that  R  is  a  column  of  M .  Then,  the  likelihood  for  £  given  by 
£n  (M_I£)  has  the  properties  (for  some  c  >  0  and  c'  >  0): 

(a)  E^\&2  (M-'O  -  en/2  (M~lt)  I2  <  cW  -  e"|  (1  +  2|e'|  V  jf '|) , 

(b)  EP^*{M-li)<e-c'<M-l\ 

by  nonsingularity  of  M  and  Lemma  C.2.  By  Theorem  1.5.2  of  Ibragimov  and  Has'minskii  (1981b)  for  any 
local  sequence  of  -yn  (<5)  =  70  +  Hn5,5  e  Rd  and  any  N  >  0 

lim       H-NP,n(s){\Zn(r)\>H}  =  0, 

ri  —too, n— too 

(-  \  l-         \  (B13) 

hence    hm^  EP^  {s)p  \Zn  (r) ;  r 1  =  J5p70  p  ^Z  (t)  ;  t  j  <  00. 

Then  identically  to  the  steps  in  the  proof  of  Lemma  3.1,  it  can  be  concluded  that  {c(t)}  minimizes  the 
asymptotic  average  risk  in  the  sense  of  achieving  the  infimum  of 

lim  sup  lim  sup —T^r  /    EP^       p{Zn\T)d&. 

KfRd        n-too     A("-)Jk 

over  all  statistic  sequences  {£}  which  are  measurable  functions  of  sample  (Yj, Xi, i  <  n),  where  Zn  = 
(c  —  r„  (7n  (5))).  The  rest  of  the  argument  that  establishes  the  r-posterior  quantiles  are  (1  —  r)-quantile 
unbiasedness  and  resulting  coverage  properties  is  identical  to  the  proof  of  Theorem  3.3  ■ 

B.5.  Proof  of  Lemma  3.1.  Claim  1  is  just  a  special  case  of  Theorem  1.1  of  Lehmann  and  Casella  (1998), 
Chapter  5.  Claim  2  follows  by  an  argument  similar  to  that  given  by  Ibragimov  and  Has'minskii  (1981b) 
p. 93.  Details  are  omitted  for  brevity.  See  Chernozhukov  and  Hong  (2003).  ■ 
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Appendix  C.  Important  Lemmas 
Let  7  =  (/?,a)  and  h  =  (hg,ha).  Define  the  standard  Hellinger  distance  ri  (757  +  ft)2  = 

j  J  l/'/2  (v-9(x,0  +  h0);  x,f  +  ft)  -  f1'2  (y-g  (x,0)  ;  1,7)  \2dyFx  (dx) . 
(note  that  Ft     (dx)  is  taken  outside  the  |  •  |2-brackets,  since  it  does  not  depend  on  the  parameters). 


Lemma  C.l  (Hellinger  Distance  Properties).    Under  C0-C5,  there  are  a  >  0  and  A  >  0  such  that  for 
all  h  such  that  7  +  h  6  Q,  uniformly  in  7 

(a)  rl  (r,^h)>2  ^J1^ ',  I^P)   "*    W  r|  (7;  7  +  ft)  <  A  (|ft,|  +  \ha\2)  .  (C.l) 


Lemma  C.2  (Exponential  Tails  and  Holder  Continuity).    Given  (C.l),  for  some  no  and  all  n  >  no 

/or  any  z,  z   such  that  7  +  Hnz  6  5  and  7  +  H„,z'  6  Q,  and  some  a'  >  0  uniformly  in  7 

£p7«»  (*)'/2  <  e-'""-1',      £p7  |4,  (z)1'2  -  In  (z'Y/2  \2<A  (\z  -  z'\)  (1  +  2  •  \z'\  V  \z\)  .  (C.2) 


C.l.  Proof  of  Lemma  C.2.  For  some  B  >  0  and  |  ■  |  denoting  the  sup  norm, 


,„  (i) 
EPJn(z)1/2  < 


1  -  2  r2  h>  P  +  u/n> a  +  V/Vn) 


:2>  n     2/       «.       /  .       /    /--\    (3)  „         ■""(l-I.H?)  (4)  m.«(M.I"l2)     (6)        -.I-I  +  - 


where  constant  Kg  depends  only  on  the  diameter  of  the  parameter  space  Q\  (1)  follows  by  the  standard 
manipulations  of  Hellinger  distance,  as  on  p. 260  in  Ibragimov  and  Has'minskii  (1981a);  (2)  follows  by  the 
inequality  (1  —  r)  <  e_r  when  r  >  0,  (3)  is  given  by  (C.l),  and  (4)  and  (5)  are  obvious.  Also, 

EP,\tn  (z)'/2  -  *„  (z')'/2  |2  <nr|  (7  +  (u/n, v/-J^)  ; 7  +  (u'/n,  w'A/n))  <  A  (\u  -  u'\  +  \v-  v'\2) 

<A  (\z  -z'\  +  \z-  z'\2)  <  A  (\z  -  z'\)  (1  +  2  ■  \z'\  V  |*|)  , 

where  (1)  follows  by  the  standard  manipulation  of  Hellinger  distance,  as  on  p. 260  in  Ibragimov  and  Has'minskii 
(1981a),  (2)  is  given,  (3)  and  (4)  are  obvious.  ■ 
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C.2.  Proof  of  Lemma  C.l.  In  order  to  establish  (C.l)(b),  let  7  =  (/?, a)  and  h  =  (hp,ha), 
rl{T,l  +  h)<Ex  j  (/1/2  (y-g(X,l3+  M  I*;/?  +  fc/j,  Q  +  ha)  -  fi/2  (y  -  g  (X,0)  \X;1))2  dy 


(i) 
<  Ex 


f 

■Hsix,, 


\f  (y  -  g(X,p  +  h0)\X;P  +  h0,a+ha)  -  f(y  -  g(X,0)\X;i) 


dy 


'[9{X,0),g(x,0+h/))] 

+  Ex  j  (/'/2  (y-  g(X,0  +h0)\Xn  +  h)  -  f1/2  (y-  g(X,p +  hp)\X;fi +  hp,a)Y 

J\9iX.0),9(X'0  +  ht))Y 


dy 


Ex 


J\9<.x,0),g(x„ 


(X,8+h0)]c 


f  (y  -g(X,0  +  M  \X;p  +  he,a)  -  /  (y  -  g  (X,0)  \X;-y) 


dy 


(2) 


<  2Ex\g(X,p  +  hfi)-g(X,0)\f 


da 


+  \ha\2  I    Ex  [>df      (y-9(X,/3  +  h0)\X,p  +  hg,a  +  uJh. 
+  ]h0]lExf\ 


dyduj 


(3)     _ 
< 


2f\hs\Ex  [ 
Jo 


\df(y-g  (X, p  +  whff)  \X,  §_  +  whe,  a) 

dp 

\dg(X,p  +  u,hg) 


dp 


dydw 
du  +  O  (\ha\2)  +  0  (\h0\) 


=  0(\he\)  +  0(\ha\2) 


where  [a,b]  =  [0,6]  if  a  <  6  and  =  [b,a]  if  6  <  a,  and  the  bound  is  uniform  in  7.  The  first  inequality  follows 
by  triangle  inequality  and  from  \a  —  b\2  <  \a2  —  b2\  for  a  >  0  and  6  >  0.  The  first  term  in  the  second 
inequality  follows  from  the  fact  that  \f  (|)  |  <  /.  The  second  and  third  terms  in  the  second  inequality  are 
by  the  Taylor  expansion  and  Pubini.  The  first  term  in  the  third  inequality  follows  from  Taylor  expansion 
and  Pubini.  The  second  term  in  the  third  inequality  follows  from  C4,  while  the  third  term  in  that  inequality 
is  by  C2. 

The  lower  bound  from  below,  the  equation  (C.l)(a),  is  established  by  considering  separately  \h\  <  S  for  some 
sufficiently  small  S  and  \h\  >  S. 


Indeed,  for  sufficiently  small  S  and  \h\  <  8  it  is  shown  below  that 

r2{T,l  +  h)>   const  max(|ft^|,  |/i„|2). 
On  the  other  hand,  by  the  identification  condition  CO  for  all  \h\  >  8  such  that  y  +  h  £  G 

r2(7;7  +  /0  >  es  >  0. 
Hence  for  some  a  >  0  the  bound  in  (C.l)(a)  is  immediate  from  (C.3)  -  (C.4). 
It  remains  to  prove  (C.3)  for  \h\  <  8  for  some  sufficiently  small  8.  Write 


(C.3) 


(C4) 


■■■'(-■       ■'"      >■'•./  (f,/2(y-g(X,p  +  hg)\Xn  +  h)-fi'2(y-g(X,p)\X--y)Ydy 

l[9(X,e),g(x,(!+hl))]  x  ' 


Ex  f  (f1/2(y-g(X,P  +  hs)\X;1  +  h)-f,/2(y-g(X,p)\X;1)Ydy. 


>  const  Ex 


dg{X-p)\ 


>  const  \hg\ 


For  small  hg,  we  can  bound  /  from  below  uniformly  in  7  by 

Ex^\g(X,p  +  hg)-g(X,fi)    pI/2  (X,f)  -  q,/2  (X,7) 

using  assumption  C3  and  Taylor  expansion.  On  the  other  hand,  by  C1-C3,  bound  //  from  below  by: 

Ex  f  (fi/*(y-g(X,p  +  hg)\X-1  +  h)-fi/2(y-g(X,0)\X-y))2dy 

J\g{x,e),9(x.g+h9)Y  v  J 


l[g(X,3),g(x,B+hg)Y 
X  f  (* 

J\g(X.0),g(X.e  +  h„)Y     \ 


dfll2(y-9(X\P)h) 


) 


dy-o(\h\2) 


l[g(X,0),g(x,0+hl))]c   V  <9t 

Under  C2,  C3  and  C4(a),  a  further  lower  bound  is  \h\2  inf,u|=1  Ex  Ju(x,fS),g<x.0+hg)Y  (^5(!,7f  ,g);i,)'u)    dy 
f  ff^(y-g(X,/3)-n)' 


>i*r   w& 


<>, 


u  1    dy  +  O  (|/ig|)  1  >   const  |/i|    >   const  |/iQ|  , 


for  sufficiently  small  |ft|,  where  the  remainder  term  O  (\hg\)  arises  from  neglecting  the  integrand  over  the  small 
area  [g  (X,  0) ,  g  (X,  p  +  hg)}  and  using  bounds  in  C2  and  C3  to  do  so.  On  the  other  hand,  if  assumption 
C4(b)  holds,  the  uniform  lower  bound  is 

,a/'/2(y-g(*;/?)l7)V 

da 


Ex 


f(«: 


&•) 


*  inf  Ex  J    I ^     alX'^'^  u)    dV>   const  l^|2 


Conclude  inf7  r2.  (757  +  h)  >   const  •  max  (\hg\,  \ha\ 
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Table  1.  Estimator  Performance  for  Intercept  /?o-   (Based  on  500  repetitions). 


Estimator 

RMSE 

MAD 

Median  AD 

pthloss  (p=10) 

n=100 

Posterior  mean 

0.0114 

0.0081 

0.0059 

0.0401 

Posterior  median 

0.0115 

0.0078 

0.0054 

0.0433 

MLE 

0.0127 

0.0088 

0.0063 

0.0369 

n=400 

Posterior  mean 

0.0029 

0.0021 

0.0015 

0.0104 

Posterior  median 

0.0030 

0.0020 

0.0014 

0.0106 

MLE 

0.0034 

0.0023 

0.0015 

0.0103 

Table  2.   Estimator  Performance  for  Slope  /3i.   (Based  on  500  repetitions) 


Estimator 

RMSE 

MAD 

Median  AD 

pthloss  (p=10) 

n=100 

Posterior  mean 

0.0212 

0.0147 

0.0105 

0.0983 

Posterior  median 

0.0214 

0.0144 

0.0096 

0.1053 

MLE 

0.0216 

0.0145 

0.0096 

0.0693 

n=400 

Posterior  mean 

0.0053 

0.0037 

0.0026 

0.0227 

Posterior  median 

0.0054 

0.0037 

0.0025 

0.0228 

MLE 

0.0057 

0.0038 

0.0024 

0.0201 

Table  3.   Comparison  of  Inference  Methods:   Coverage  and  Average  Length  of  the  Nominal 
90%  Confidence  Intervals  (Based  on  500  repetitions) 


Confidence  Interval 

coverage:  intercept 

length:  intercept 

coverage:  slope 

length:  slope 

n=100 

Posterior  Interval 

0.87 

0.0298 

0.85 

0.0555 

Bootstrap  P-mean 

0.88 

0.0392 

0.86 

0.0720 

Subsampling:  P-mean 

0.83 

0.0416 

0.82 

0.0770 

Limit  process:  P-mean 

0.86 

0.0346 

0.84 

0.0587 

Limit  process:  P-median 

0.85 

0.0352 

0.86 

0.0610 

Limit  process:  MLE 

0.93 

0.0347 

0.95 

0.0653 

n=400 

Posterior  Intervals 

0.87 

0.0075 

0.86 

0.0140 

Bootstrap:  P-mean 

0.84 

0.0089 

0.88 

0.0167 

Subsampling:  P-mean 

0.82 

0.0085 

0.83 

0.0158 

Limit  process:  P-mean 

0.89 

0.0085 

0.82 

0.0145 

Limit  process:  P-median 

0.86 

0.0087 

0.86 

0.0150 

Limit  process:  MLE 

0.89 

0.0084 

0.92 

0.0157 

Technical  Addendum,  Part  I 

This  addendum  includes  the  excluded  material,  such  as  omitted  simpler  calculations  and  simpler  proofs, 
and  some  background  material  on  point  processes.  The  addendum  will  be  made  available  as  a  part  of  an 
MIT  Economics  Department  Technical  Report  published  by  the  Social  Science  Research  Network. 


Appendix  D.  Point  Processes 

The  following  definitions  are  collected  from  Resnick  (1987). 

Definition  D.l  (MP(E)).  Let  E  be  a  locally  compact  topological  space  with  a  countable  basis,  and  £  to  be 
the  Borel  cr-algebra  of  subsets  of  E.  A  point  measure  (p.m.)  pon  (E,£)  is  a  measure  of  the  following  form:  for 
{Xi,i  >  1},  a  countable  collection  of  points  (called  points  of  p),  and  any  set  A  G  £:  p(A)  =  £V  l(i,  6  A).  If 
p(K)  <  oo,  for  any  K  C  E  compact,  then  p  is  said  to  be  Radon.  A  p.m.  p  is  simple  if  p{x)  <  1  Vx  6  E,  and 
is  compound  otherwise.  Let  MP(E)  be  the  collection  of  all  Radon  point  measures.  Sequence  {pn}  C  MP(E) 
converges  vaguely  to  p,  if  /  /dp„  — >  /  fdp  for  all  functions  /  €  Ck  (E)  [continuous,  real-valued,  and  vanishing 
outside  a  compact  set].  Vague  convergence  induces  vague  topology  on  MP(E).  The  topological  space  MV(E) 
is  metrizable  as  complete  separable  metric  space.  MP(E)  denotes  such  metric  space  hereafter.  Define  Aip(E) 
to  be  cr-algebra  generated  by  open  sets. 

Definition  D.2  (Point  Processes.  Convergence  in  Distribution.).  A  point  process  in  MP(E)  is  a 
measurable  map  N  :  (CI,  J7,  P)  — >  (Mp  (E)  ,MP(E)) ,  i.e.  for  every  elementary  event  w  6  ft,  the  realization  of 
the  point  process  N(u;)  is  some  point  measure  in  MP(E).  Weak  convergence  of  the  point  processes  N„  taking 
values  in  MP(E)  is  the  same  as  for  any  metric  space,  cf.  Resnick  (1987):  we  shall  write  N„  =$■  N  in  MP(E) 
if  Eph(Nn)  —>  Eph(N)  for  all  continuous  and  bounded  functions  h  mapping  MP(E)  to  R.  Note  that  if 
N„  =>  N  in  MP(E),  then  JEf(x)dNn(x)—>d  Jf:f(x)dN(x)  for  any  /  6  Ck(E)  by  continuous  mapping 
theorem. 

Definition  D.3.    (Poisson  Point  Process)  The  point  process  N  is  a  Poisson  Point  Process  with  mean 

intensity  measure  m  on  (E,£),  if 

(a)  for  any  F  €  £,  and  any  non-negative  integer  k 


=  *)  =  { 


P(N(F)  =  *)  =  <    --"MfOVH       if -W  <  oo 

0  ifm(F)=oo, 


(b)  if  (Ft,i  <  k)  are  disjoint  sets  in  £,  then  (N(F,),i  <  k)  are  independent  random  variables. 


Appendix  E.  Plausibility  of  CO  C5 

This  section  illustrates  the  plausibility  of  conditions  C0-C5  using  the  Paarsch's  (1992)  independent 
private  value  auction  model  in  section  2.1.  For  clarity  consider  the  example  used  in  the  Monte-Carlo 
section:  02  =  1  and  8\  =  exp(/3'Z),  with  the  following  assumptions.  More  general  examples  can  be  verified 
similarly.  For  example  62  can  be  made  a  function  of  regressors  too,  and  verification  proceeds  similarly  but 
is  notationally  much  more  burdensome. 

Assumptions  (in  addition  to  those  listed  in  Section  2.1) 

Al  Yi,Xi  =  (Z,,mi)  are  iid,  /?  G  B,  a  compact  convex  subset  of  W*.  X  6  X,  a  compact  subset  of  Rd"+I. 
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A2  Fx  (x)  does  not  depend  on  /3.  EZZ'  is  positive  definite. 

A3  3  <  m,  <  M  for  some  M  <  oo,  where  m,  is  (non-degenerate)  number  of  bidders  (minus  1). 

Verification  of  CO:  Al  implies  iid  sampling  and  compactness  and  convexity  of  parameter  space  stated  in 
CO. 

Under  the  stated  parameterization 

g  (A',/3)  =  exp  {0 Z)  (m  -  1)  /  (m  -  2) 


and 


'■('■••  '  ■",.  9{x^r,m+1He>o). 


Note  that  7  =  j3  and  A2  and  A3  imply  that  for  /3  ^  /3' ,  g(X,@)  ^  g(X,/3')  for  some  X  with  positive 
probability.  Since  the  f  (y  —  g  (X,/3)  \X,/3)  density  function  is  strictly  monotone  in  g  (X,/3),  identification 
holds. 

Verification  of  CI:  Clearly  the  model  is  a  one-sided  model  Cl.(ii)  in  which  f(e\x,(3)  =  q(x,/3)  =  0  and 

P(XJ)=m[g(XMm+,=9lxT) 
is  strictly  bounded  away  from  0  and  from  above  by  Al  and  A3. 

Verification  of  C2:  /  (e|A,/3),  as  defined,  is  obviously  upper  semi-continuous  at  «  =  0,  is  maximized  at  e  =  0 
for  each  A  and  /3,  and  is  uniformly  bounded  by  Al.  Its  first  derivative  in  e: 

is  continuous  in  e  except  at  e  =  0  and  bounded  uniformly  in  (e,  A,  /?)  by  Al  and  A3. 

9(X,0) 


The  first  partial  derivative  of  /  (e\X, /?)  in 
d 


dp 


f{e\X,p)  =  Z 


m  —  (m  +  1) 


f(e\X,P) 


[e  +  g(X,/3)]i 

is  clearly  continuous  in  X  and  /3  and  bounded  uniformly  in  (e,  A,/3)  by  Al  and  A3.  This  holds  similarly 
for    -*2 


epeprf(e\X,P) 

a2 


d/Sd/3 


7f(e\X,0)  =  ZZ'f(e\X,j3) 


I  m  — 


(m  +  1) 


9(X,j3) 


[e  +  g[X 


'i_V_ 


(m  +  1)ri^L  +  (ra  +  1)jM 


[e+g(X,p)] 


[e  +  g(X,0)Y 


All  of  the  above  quantities  are  continuous  in  X  for  each  e  and  /?.  Finally,  for  some  constant  C 


dp 
which  implies 


f(y-g(X,P)\X,0)  \=l(y>9(X,p)) 


m2g(X,pr 


,,771  +  1 


-9(X,P)Z 


sup  Ex 

0 


f\&{*-'&>fi\*'i*)\*sBxcy=x 


<i{y>-C)—^, 


dy  <  00. 


Verification  of  C3:  This  is  verified  immediately  by  noting  that: 

^g(X,P)  =  g(X,p)Z, 


and 


due  to  Al.  Also  note  that 


dpdp 


dg(X;p)dg(X;p) 


-g{X,p)=g(X,p)ZZ', 


E  [g(X,j3)2ZZ']  >  cEZZ 


dp  dp 

for  some  c  >  0  due  to  Al  and  A3.  By  A2  EZZ'  is  positive  definite. 

Verification  of  C4:  Verification  of  C4  is  not  needed  since  parameter  a  is  not  present. 

Verification  of  C5:  When  the  true  parameter  is  7  so  that  e,  =  Y,  —  g(Xj,P),  we  have  that 

/  (Y,  -  g(X„0)\X,p)  =  m—l^S-^-^l  (Y,  -  g(X„p)  >  0)  . 


Therefore  by  Al  and  A3 


sup 
B 


£\nf(Y-g(X„0)\X„p) 


[e,+g(X„l3)r 


=  supl(Y,-g(X„P)>0) 
& 


(m+D  9ixrJK^ 


m  —  (m  +  1) 


g(x„p) 


[t,+g(X„p)}\ 


z, 


t,  +  g{Xi,P) 
<  C\  <  00, 


and 


sup 

3 


^\nf(Y,-g(X,J)\X,J) 


=  sapl{Yi-g(XiJ)>Q) 
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g(x,J) 


{m  +  d    9(x;J)      iZ, 


(m+1) 


[e,+g{X„p)}\ 


zX 


e,+g(X„p) 


<  Cz  <  00. 


Appendix  F.  Excluded  Simple  Derivations 
F.l.  In  Proof  of  Theorem  3.1.  These  additional  derivations  are  added  at  the  request  of  a  referee.  Write 

Tl 

<?!»(*)=  £>«(*)    xl„(«)  (F.l) 

t=i 

where  l)n(«)  =  [l(e,  >{A„(X„u)/nV0})  +  l(«,  <  {An(X„u)/n  A  ()})]  . 

At  first,  for  a  fixed  2,  we  use  a  Taylor  expansion  of  f,„(z)  for  each  i  and  plugging  this  back  into  expression 
for  Q\n{z) 


Tl    *— » 


,1  ^dg(Xup0+^)  d  ,     ((         An(X„n,n) 
5/3  dt 


In/ 


(« 


Xi ,  po  +  u/n,  Qo  + 


v/y/nj  l«n(«) 


t 


V-=V"*  5— ln/(«i|A'i,7o)  lin(u)  +«'-  V"  ^3  In/  I  e, 2 - — \X,,Po  +  uln/n,a0  +  v/y/n  )  l,„(u) 

y/n  '—^  oa  n  *—*  op  \  n  / 


1    , 
+  2V 


~~  /       a     a    ,f      e> A,,/30  +  ,"/rJ,Qo  +  V,„/VJ1      l,n(u) 

n  f— f  oaoa      \  n  I 

1  =  1  ' 


v, 


where  «i„  is  a  point  on  the  line  between  0  and  u  that  depend  on  u  and  (e,,  Xi)  (but  not  on  other  observations), 
and  Vin  is  a  point  on  the  line  between  0  and  v  that  depends  on  v  and  (ty,  Xj)  (but  not  on  other  observations). 
Thus,  the  terms  are  iid  in  the  above  summations. 

Application  of  Chebyshev  inequality,  the  bounded  density  condition  Cl,  and  the  bounded  derivative 
condition  C3  removes  lin(u)  in  the  term  II,  and  adds  op(l)  to  the  above  expression.  For  example,  consider 


Var 


-^f^ln/(ei|*,-,7o)(l,-n(u)-l) 


<   const  •  {f/ff  ■  -f]E\lin(u)  -  1| 

—  Tt.    * * 


(F.2) 


Similarly 


Hence 


<   const(/7/)2-2-(/p')/7i  =  0(l). 
£i  E  £  ln  f^\X»^  (!.»(«)  -  1)  =  o(l). 

*  1  =  1 

Also  elementary  calculations,  as  in  (F.2),  and  application  of  Chebyshev  inequality,  show  that  by  the 
bounded  density  conditions  C2  and  bounded  derivative  condition  C3,  we  can  replace  «,„,  Vin  by  0  and 
remove  l,„(a)  from  the  expression  for  /,  //'  and  ///,  and  add  a  term  op(l)  to  the  whole  expression.  The 
application  of  Markov  LLN  along  with  C5  allows  to  replace  each  of  the  terms  with  its  limit  expectation, 
and  gives  the  required  conclusion. 

F.2.  Proof  of  Independence  needed  in  the  Proof  of  Theorem  3.1.  These  additional  derivations  are 
added  at  the  request  of  a  referee.  The  proof  was  omitted  from  the  main  text,  because  it  follows  essentially 
from  the  standard  proof  concerning  the  independence  of  minimal  order  statistics  (extremal  processes)  and 
sample  averages  of  general  form  (partial  sum  processes),  see  e.g.  Resnick  (1986)  and  Lemma  21.19  in  van  der 
Vaart  (1999).  It  is  presented  here  for  completeness. 

Since  up  to  op(l)  terms,  the  only  stochastic  element  in  {Qcn  (zj)  ,j  <  I)  is  W„,  we  need  to  show  that 
(Qn  ("Uj)  ,j  <  ')  are  asymptotically  independent  of  W„,  where  ignoring  op(l)  term 

On(u)  =E  f ln  ^Ya1  (°  <  nti  <  A(*')'«)  +ln£7$l1  (°  >  "e'  >  AW«) 


p(Xi 


q(X, 


For  clarity  we  first  present  the  proof  for  the  case  when  I  =  1,  and  discuss  the  changes  needed  to  accom- 
modate I  >  1.  For  notation  sake,  we  do  not  index  P  by  7. 

Case  I:  /  =  1.  By  an  argument  similar  to  that  in  (F.2),  Wn  —  Wn  =  op  (1),  where 

Wn  =  -^^^■ln/(e1|X,,7o)l(e,  >  {A(X,)'u/n}  V  0  or  a  <  {A(X,)'u/n}  /\0) 

V  1  =  1 

Therefore  it  suffices  to  show  asymptotic  independence  between  Wn  and  Q^  (u).  Define 

71 

Ql  (u)  =  ^  [1  (0  <  nc,  <  A{X,)'u)  +  1  (0  >  ne,  >  A(X,)'u)] 


and 


CO 

QZ,(u)  =  J2  [!(°  <  *  <&{Xi)'u)  +  1(0  >  J'f  >  A  (#/)'«)] 


Then  asymptotic  independence  between  W„  and  Q^  («)  follows  by  the  Portmanteau  Lemma  and  proving 
that  for  any  real  x,y  and  integer  k: 

limsupP  IqI  (u)  =  fc,q*  (u)  <  x,  W,  <  y}  <  -P  {<?L  («)  =  *,<&  (u)  <  x}  P{W  <  y}  . 
To  show  (F.3),  proceed  in  two  steps.  In  Step  1  below,  invoking  iid  sampling,  it  is  shown  that 

P  [Ql  («)  =  k,Qi  (u)  <x,Wn<y}=P  [Ql  (ti)  =  fc,Q*  (u)  <  x}  ■  p{yiZiW„  <  y} 
where 


(F.3) 


(F.4) 


W„ 


E^/tel*.*). 


(F.5) 


yjn-k  f^  da 

where  ?;,  X;  for  i  <  n  —  k  axe  i.i.d.  draws  from  the  distribution  of  ei,X,  conditional  on 

A(ei,Xi)  =  {e,  >  A(X,)'u/n  V  0  or  e,  <  A(X,)'u/«  A  0}. 

Step  2  applies  CLT  to  show  that  Wn— >d  W.  Finally,  convergence  N  =>  N  implies  similarly  to  the  proof  in 
Theorem  3.1  and  by  the  Portmanteau  Lemma  that 

lim sup P{QZ  (u)  =  k,Qdn  (u)  <  x)  <  PJQ^  (u)  =  k,Qt  («)  <  x}  . 

Thus  the  proof  is  complete  given  Steps  1  and  2. 

Step  1.  Define  pn  =  P  {A(ci,  Xi)c}.  By  i.i.d.  sampling  in  CO,  the  left  hand  side  of  (F.4)  can  be  written  as 


(l)pUl-Pn)n-kP\         J2         M"«»,*i)<* 


;=n-i  +  ] 


1  (0  <  nti  <  A(Xj)'u)  +  1  (0  >  ru,  >  A(X,)'u)  =  1,  for  i  =  n-  k  +  1,. 


,...,71  > 


XP{^g£'n/(f'|X-'70)^ 


=  P{Q^M=k,Qi(u)<x} 
e,  >  {A(Xj)'u/n}  V  0  or  e,  <  {A(X,)'u/n}  A  0,  for  a 


lit  <  n-  k  >, 


where  function  /„  is  defined  in  Theorem  3.1  and  W„  in  (F.5). 
Step  2.  W„— >d  W  follows  by  checking  three  conditions: 

(a)  £Wn->0, 

(b)  Var  (w„)  -»  Vor(W),  and 

(c)  Lindeberg's  condition  is  satisfied. 

Condition  (a)  requires  EW„  =  Vn  -  kE-§^  In  /  (ii\Xi,~fo)  ->  0.  This  is  true  because 

E*  />!(<,,  x,)£ln/  W*.>7o)  /(f,|X,7o)^,  _        /= -Sx^(«.,x.)«£ln/M^7o)/(*i|*,-,7o)<fei 


EW„  =Vn^k 


=  —  vn- fc- 


P{>4(f„X,)} 
By  C2  limn-,00  P{A(e,,X,)}  =  1.  In  addition,  for  large  enough  n: 


P{A((„X,)} 


-Ex   f  ^-\nf(e,\X,no)f(u\X„1o)de,<2U'/f)(9f)(b 

JAUi.Xi)*  oa  n 


Hence  £W„  — >  0.  A  similar  calculation  verifies 


Var(wn)  =JS^ln/(e,-|X„7o)^ln/(ei|X1-,7o)'-    E—  In/  (e,|X1)7o) 
^££ln/(e,|XI,7o)^ln/(e1|X!,7o)'  =  Var(W„). 


The  final  step  is  to  verify  the  Lindeberg  condition,  that  for  all  A  ^  0  and  all  f  >  0, 

n-k 

>  &n-k\/n] 


^£#lln>NW)i( 


V^ln/^IXi.To) 


where  we  define 


2 
°n-k 


^T,Var\x'i:lQf(^^) 


n  —  k 


=  X'Var 


da 


ln/(ei|Xi,7o) 


(F.6) 


(F.7) 


Since  a\_k  converges  to  a  positive  constant  by  (b),  the  conclusion  (F.6)  follows  from 

^(A'^ln/(e-,|X,,7o))    ^(V^ln/^pG^o))    / P{A(u, Xt)}  <  oo; 

which,  in  turn,  follows  from  limn_nx,  P{J4(e,, -X,)}  =  1  and  by  C4. 

Case  II:  I  >  1.  This  involves  more  tedious  notations  but  follows  the  same  logic.  Note  that  W„-W,  =  ov  (1), 
where 

Wn  =  ^y"#-ln/(€,|X,,7o)l(e,  >{A(X0'%/n}V0ore,  <  {^{X^'uj/n}  A  0,  for  all  j  <  i)  • 

vn  ~^  oa 

Hence  it  suffices  to  show  the  asymptotic  independence  between  (Q^  (uj)  ,j<l)  and  Wn.  By  the  Portman- 
teau Lemma  it  suffices  to  show  that  for  any  real  Xj  and  y  and  any  integers  kj  and  k  for  j  =  1,  ...,l 

limsupP  {qI  (m)  =  kuQdn  (Ul)  <  xu...,Qpn  (u,)  =  k,,Qdn  (ui)  <  xhQi  =  k,  W„  <  y\ 


<P{QPX  («i)  =  fci.Qi  («i)  <  x,,  ...,(&,  (ti,)  =  fc,,Qi  («i)  <  *,,QU  =  *}  ■  -P{W  <  y} 


(F.8) 


where 


Q*n=^2(l[0  <  ne,  <  A  (X,)'  uj,  for  some  j  <  l]  + 1  [0  >  net-  >  A  (X,)'  u,,  for  some  j  <  I]) 


and 


QU  =  ^  fl  [0<  J,  <  A  (Xi)'  uj,  for  some  j  <  l]  +  1[0  >  J\  >  A  (AT/)'  jy,  for  some  j  <  I]) 

By  iid  sampling  in  CO,  the  left  hand  side  of  (F.8)  (without  limsup)  equals 
P  {QKu,)  =  huQiim)  <  xU-,Q*(ut)  =  khQi(ut)  <  xi,Q(n  =  k) 

„f  i  v;'« 


da 


\i>f(ei\Xi,jo)<y 


€i  >  {A(Xj)' uj/n]  V  0  or  c,  <  {A(X,-)'tt//n}  A  0,  for  all  j  <  J,  for  alii  <  n  -  fc  I, 


-p{v^?w"-!'} 


where  W,   =    -J=f^°=1   ^  In  /  I  e,|X,,7o)  and  e,,X,   are  i.i.d.     draws  from  the  distribution  of  a,Xi 
conditional  on 

A(u,Xi)  =  {e,  >  {A(X,)'uj/n}  V  0  or  u  <  {A(A\)'u_,/ti}  A  0,  for  all  j  <  I}. 
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Similarly  to  the  proof  of  Theorem  3.1,  convergence  N  =>  N  implies  that  by  the  Portmanteau  Lemma  that 
limsup  p{Qpn  (ui)  =  ki,Qi  (ui)  <  xi,..., QS  (w)  =  fo.On  (w/)  <  x,,Q|,  =  A:} 

<p{Q^(ui)  =  *i,Q£,(ui)<zi,...,Q^(ui)  =  fci,Q»(ui)<x,,QU  =  *} 

Finally,  that  limn_,<x,  f{1/ZL^Wn  <  y}  =  P  {VK  <  y}  follows  similarly  to  the  proof  of  (a)-(c)  in  Step  2  in 
Case  I  (when  1  =  1).  ■ 

F.3.  Proof  of  Lemma  3.1.  Claim  1  is  just  a  special  case  of  Theorem  1.1  of  Lehmann  and  Casella  (1998), 
Chapter  5.  Claim  2  follows  by  an  argument  similar  to  that  given  by  Ibragimov  and  Has'minskii  (1981b) 
p.93. 

Let  Z*  =  H„'  (7,.^,„  —  7n(<5)),  where  index  5  emphasizes  the  dependence  of  the  distribution  of  Zt  on  the 
local  parameter  sequence  7n(<5).  Define 

I(K)  =  limsup  In(K),    In(K)  =  -^—  [  EP p(Zsn)d6. 

It  follows  from  Fatou's  lemma  and  conclusion  (B.7)  that  I(K)  =  -^rj^.  JK  EP_lop(Zao)d&  =  EP^0p(Z).  Thus 
I  =  limsup^R,;  I(K)  =  EP^op{Z). 

Next  let  Z^(-K)  =  H„'  (7p.ak,t.  —  7n(<5)),  where  ~/P,xK,n  is  the  Bayes  estimator  defined  with  respect  to 
the  loss  function  p  and  prior  A«(7)  =  1{H~1  (7  —  fo)  6  K}.  Define 

II(K)  =  limsup //„(«"),     //„(*)  =  rr^r  [   EP         (Zln(K))d6. 


d 


By  Lemma  B.2  and  Lemma  B.l  it  follows  that  for  any  <eK 

Zsn(K)-^dZ5(K)  =  axg  inf    /  p{z  -  {r,  -  8))      ^_~  ^..dr,. 

The  property  -  P7n(6){Z*(iir)  >  |K|  +  <$}  =  0,  for  |/C|  =  sup{|z|  :  2  6  K}  for  any  <5  e  IIId  and  n  <  00,  - 
provides  the  necessary  uniform  integrability  to  conclude  limn_Kx,  EP  p(Z„(K))  =  EP  p(Zs(K));  which 
by  Fatou's  lemma  implies  that  II(K)  =  Jk('K.  fK  E pn  p(Zs (K))d5 . 

By  finite-sample  average  risk  efficiency  of  the  Bayes  estimator  ~jP,>.K,n 

IIn(K)  <  In(K)  for  each  n,  hence  IJ(K)  <  I(K)  =  I. 

Then  limsupKTRj  H{K)  —  I  follows  from  (a)  II(K)  <  I  for  each  K,  (b)  noting  that  for  any  8  6  Md  as 
K  f  R    Z  (A")  — >p  Z,  and  (c)  dominated  convergence  theorem,  as  shown  below. 

The  claim  (2)  now  follows.  Indeed,  suppose  there  exists  an  estimator  sequence  {7,1}  that  achieves  a 
strictly  lower  asymptotic  average  risk.  Define  Z£  =  H~l (yn  —  7n(<5)),  then  it  must  be  that  for  some  K,  no, 
and  infinitely  many  n  >  no,  wW  fK  EP  p{Zsn)d8  <  II„(K),  which  contradicts  to  finite-sample  average 
risk-efficiency  of  the  Bayes  estimator  ~tP.\K,„  for  each  such  n.  ■ 

Proof  of  limsupKTR<i//(X)  =  I.  Rewrite  II(K)  <  I(K)  as  JK  EP^o  [p(Zs(K))  -  p(Z)]  d5/X(K)  <  0  or 

J  E",o  [p(ZS(K))-p{Z)]+dS/^K)-J  EPno  [p{Zs{K))-p(Z)]~  d8/\{K)  <  0.  (F.9) 


Next  as  r(K)  — >  oo 

f  EPlo  \p(Zs(K))  -  p(Z)] '  dS/X(K)  =  [  EPlQ  \p{Z^K\K))  -  p(Z)]  ~  dr,  ->  0,  (F.10) 

JK  l  J  J  (-1,1)*  L  J 

where  r(K)  denotes  the  width  of  the  cube  K  (which  is  assumed  to  be  centered  at  zero).  Conclusion  (F.10) 
follows  by  (b)  ,  (c),  and  the  domination  (uniform  integrability)  condition: 

for  any  77  6  (0,  l)d  and  any  K,   \p(ZVT{K) (K))  -  p(Z)Y  <  p(Z),  where  EPl(jp(Z)  <  00.  (F.ll) 

But  (F.9)-  (F.ll)  imply  that  it  must  be  that  JK  EP^o  [p(Z*(K))  -  p(Z)]+  d5/X(K)  ->  0  as  r(K)  ->  00. 
Thus  JI(K)  -  J  -+  0  as  #  t  Rd-  ■ 


Technical  Addendum,  Part  II:  Maximum  Likelihood  Estimation 

This  addendum  includes  the  material  on  the  maximum  likelihood  estimation.  The  main  text  only  contains 
the  statement  of  the  result.  The  addendum  will  be  made  available  as  part  of  a  MIT  Economics  Department 
Working  Paper  published  by  the  Social  Science  Research  Network. 


Appendix  G.  Maximum  Likelihood  Procedures. 

The  Maximum  Likelihood  Estimator  (MLE)  is  given  by 

7  =  (&',&')'  =  arginf  -  Ln(f). 

765 

We  obtain  various  properties  of  the  MLE  such  as  consistency,  rates  of  convergence  (n  for  the  parameter  j3 
and  -Jn  for  the  parameter  a),  and  its  asymptotic  distribution.  The  limit  distribution  for  a  is  the  standard 
one  for  smooth  likelihood  analysis.  The  asymptotic  distribution  for  /3  is  an  extreme  type  distribution,  which 
may  be  used  for  inference  in  the  same  way  as  any  standard  distribution.  For  example,  denoting  the  limit 
variable  Ze  and  the  parameter  sequence  7n(<5)  =  7o  +  Hn5, 

Zi  =  n0  -  /?„(<*))->*  Z* , 
for  any  continuously  differentiate  functions  r  of  /?, 

«(rW-r(ft))^|r(W'Z'. 

Then,  quantiles  of  distribution  of  ^r(/?o)' ' Zs  can  be  estimated  by  simulating  a  series  of  draws  of 
£sr((3)'ZP ,  according  to  the  formulas  given  in  this  paper,  and  then  used  for  classical  inference  and  hy- 
pothesis testing.  (Alternatively,  parametric  bootstrap  may  be  used).  For  example,  denoting  by  c(t)  the 
Q-quantile  of  §gr(f5)' Z° ,  an  asymptotic  90%  confidence  interval  is  given  by 


'rcft-321,  kA-S 


?} 


The  limit  distribution  is  also  useful  for  bias  correction.  For  example,  to  remove  the  (first-order)  asymptotic 
median  bias,  simply  take  r(ft)  —  £~- 

On  the  other  hand,  inference  about  a  is  standard.    The  limit  distribution  of  either  MLE  or  BE's  (for 
symmetric  loss  functions)  under  the  parameter  sequence  7n(<5)  =  70  +  Hn5  is  given  by 

Zl  =  V^(a  -  a„(<5)Hd  Za  ±N{Q,J~l), 

where  J  =  —Eo*a,  ln/(Yi  —g{Xi,j3)\Xi,p,a)  is  the  usual  information  matrix  for  a.  The  limit  variable  Za 
is  in  fact  independent  from  Z  ,  which  follows  from  the  information  about  /9  coming  from  a  small  fraction 
of  the  whole  sample  located  near  the  jump  points,  and  information  about  a  coming  from  the  whole  sample 
and  averaged  over  (so  that  impact  of  the  small  fraction  is  negligible).20 


20This  intuition  is  based  on  e.g.  Resnick  (1986)  and  van  der  Vaart  (1999)'s  Lemma  21.19  concerning  the  indepen- 
dence of  minimal  order  statistics  and  sample  averages. 
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The  usual  estimates  of  the  information  matrix  can  be  used  for  inference.  This  result,  combined  with  the 
one  above,  can  also  be  used  for  inference  about  functions  r(fi,  a)  of  both  /?  and  a  using  the  delta-method: 

>/n(r08,  a)  -  r(/30,a0))  =  dr^ao) ' ^0  -  fa)  +  dr{^ao)' ^i(&  -  Qo)  +  o,(l/VS) 

&dr(g2iaoyza  + 

oa 

Also,  in  this  case  it  is  possible  to  use  the  second  order  expansion,  from  which  ($  —  /?o)  does  not  vanish,  to 
better  capture  the  estimation  uncertainty,  i.e. 

/ZMA  A\      rlR    n  ^  i.  dr(p0,a0Y  ZB       gK^op)'  Za '  1  d2r(/3o,a0)  „„  ,- 

v^(r(^a)-r(/?0,ao))«         Q&         -j*  +        9q         Z    +—-—^—Z    +0p(l/^). 

In  many  situations,  such  as  the  previous  auction  example,  the  functions  of  prime  interest  depend  only  on 
/?,  and  the  regular  parameter  a  is  not  present.  Quantiles  of  the  above  distributional  approximation  can  be 
obtained  by  simulation,  which  consists  of  making  draws  of  the  variables  Z0  and  Za ,  independently  of  each 
other,  evaluating  the  above  expressions  (with  derivatives  of  function  r  evaluated  at  the  estimates),  and  then 
taking  the  appropriate  quantiles  of  the  simulated  series.  The  resulting  quantiles  can  be  used  for  classical 
Wald  intervals  and  hypothesis  testing. 

Theorem  G.l  (Properties  of  MLE).  Under  C0-C5,  and  supposing  that  —  too(z)  attains  a  unique  mini- 
mum in  R    a.s.,  then  Zn  =  Op(l)  and 


Particularly,  Z°->d  Za  =  J-'W  =  A/"(0,  J--1),  Z%->d  Z0  =  argmin^  cR^  -  £>oo(u),  and  Ze  and  Z"  are 
independent. 


This  result  states  the  consistency,  the  rates  of  convergence,  and  the  limit  distribution  of  the  MLE.  The 
limit  is  given  in  form  of  argmin  of  a  limit  likelihood.    Due  to  asymptotic  independence  of  the  informa- 
tion about  the  shape  parameter  from  the  information  about  the  location  parameter,  the  MLE's  for  these 
parameters  are  asymptotically  independent. 
Remark  G.l  (Boundary  Models).  In  the  boundary  models  the  limit  result  can  be  made  more  explicit: 

y/n(a  -  an(6))-+d  argsup  (Wv  -  v'jv/2)  =  J"1  W  =  ^(O.J^1), 

and  by  (3.7) 

n(P  —  Pn(i))— >d  arginf  (  —  exp(u'm)      such  that  J,  >  A(Xi)'u, for  all  i  >  lj, 

=  argsup  f       u'm  such  that  J,  >  A(Xi)1  u, for  all  i  >  11. 

u       ^  ' 

The  limit  distribution  of  MLE  is  thus  convenient  in  the  boundary  models  and  can  be  simulated  by  solving 
an  Li-linear  programming  problem,  which  can  be  done  at  the  speeds  of  OLS  through  the  use  of  interior 
point  algorithms,  cf.  Portnoy  and  Koenker  (1997).  In  contrast,  bootstrapping  requires  repeated  solutions  of 
a  nonlinear  programming  problem  and  is  much  less  practical. 

Remark  G.2  (Uniqueness).  The  condition  that  —  loo(z)  attains  a  unique  minimum  a.s.  is  necessary  and 
does  not  appear  to  be  problematic.  The  invertibility  of  the  information  matrix  J  guarantees  the  unique- 
ness of  Za ,  and  the  solutions  of  linear  programs  like  those  above  are  unique  under  mild  conditions,  which 
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guarantees  uniqueness  of  Z® .  E.g.  when  A  (^^'s  support  does  not  concentrate  on  a  proper  linear  subspace 
of  lower  dimension,  has  an  absolutely  continuous  component  and  the  variables  J,  are  absolutely  continuous, 
see  e.g.  Portnoy  (1991)  for  a  related  problem.  Also  when  A(/f,)'s  have  discrete  support  the  limit  result 
corresponds  to  the  result  of  Donald  and  Paarsch  (1993a)  who  show  that  uniqueness  holds  in  that  case  too. 


G.l.  Epi-graphical  Convergence.  Epi-convergence  in  distribution  has  been  developed  in  the  work  on 
the  stochastic  approximation  of  optimization  problems,  cf.  Knight  (2000),  Pflug  (1995),  Salinetti  and  Wets 
(1986),  Rockafellar  and  Wets  (1998),  among  others.  Suppose  that  the  sequence  of  objectives  {Qn}  are 
random  lower  semi-continuous  (1-sc)  functions,  that  is  for  each  n,  Qn{x)  <  liminfIj._n  Qn(x3),  Vz,  Vij  — > 
x.  Let  £  be  the  space  of  1-sc  functions  /  :  Rd  — >  R  =  [— oo,  -t-oo]  such  that  /  ^  oo. 

Qn  is  said  to  epi-converge  in  distribution  to  Q  in  C  if  for  any  closed  rectangles  .R] , ...,  Rk  in  R    with  open 
interiors  R*\, ...,Rk,  and  any  real  n, ..-,  rk: 

p{    inf  Q(x)>r u...,   inf  Q(x)  >  rk) 

(1)  f  -1 

<  liminfP^    inf   Q„(i)  >  n, ...,   inf  Qn(x)  >  rk  \ 

(2)  f  1 

<  limsupP-^    inf  Qn{x)  >  n , ...,  inf  Qn{x)  >  rk  f 

(3)  f  -, 

<  P\   inf  Q(x)>r,,...,   inf   Q{x)>rk\. 


(G.l) 


Note  that  the  inequality  (2)  is  simply  by  lower-semi-continuity.  Epi-convergence  is  a  weak  condition  that 
leads  to  weak  convergence  of  argmins.  It  is  also  an  evident  condition,  since  P[arg  infl£K  Qn(x)  <  a]  = 
P[infIg  K,i<a  Qn(x)  <  in{X£KtT.£aQn(x)].  Thus  if  one  can  characterize  the  joint  distribution  of  the  terms 
infieK,x<a  Q-n(x)  and  infl€K>ga  Qn{x),  one  obtains  the  limit  distribution  of  argmin. 

The  following  lemma  is  given  in  Knight  (2000),  Pflug  (1995)  Salinetti  and  Wets  (1986),  among  others. 
Lemma  G.l.   Let  Z„  be  s.t.  Qn(Z„)  <  inf.eRj  Qn(z)  +  en,  e„  \  0,  and  suppose 

i.   Zn  =Op(l), 

ii.   Zoo  =  arginfieRjQoo(z)  13  uniquely  defined  m  TRd  a.s.,  and 
iii.   Qn()  epi-converges  in  distribution  to  Qoo(), 
then  Zn— >i  Zoo. 

Epi-convergence  is  more  general  than  uniform  convergence,  because  it  allows  for  non-vanishing  discontinu- 
ities. In  our  case,  the  non-vanishing  discontinuities  make  the  uniform  convergence  of  the  likelihood  function 
impossible.  The  recent  remarkable  work  of  Knight  (2000)  provides  convenient  sufficient  conditions  for  verify- 
ing epi-convergence,  which  amount  to  converting  the  finite-dimensional  limits  to  epi-limits  via  a  device  called 
stochastic  equisemicontinuity.  The  work  extends  Salinetti  and  Wets  (1986)  by  defining  an  "in  probability" 
version  of  stochastic  equisemicontinuity  a.s.  We  shall  prove  epi-convergence  directly,  albeit  borrowing  the 
general  structure  of  the  proof  from  Knight  (2000).  In  fact,  part  of  the  proof  replicates  the  proof  of  Theorem 
2  of  Knight  (2000). 


G.2.  Proof  of  Theorem  G.l.  First,  note  that  the  MLE  and  other  variables  such  as  infze/c  —  Qn(z),  are 
measurable  by  Proposition  3.2  in  Dupacova  and  Wets  (1988),  given  C0-C3. 

Second,  we  use  Lemma  G.l  on  epi-convergence  to  prove  the  result.  By  definition 

Zn  =  arg  sup  tn(z)  =  arg  inf   -Qn{z), 

where  U„  is  the  rescaled  parameter  space  ^/n(A  —  an(S))  x  n(B  —  /?n(<5)),  Qn(z)  is  defined  in  the  proof  of 
Theorem  3.1.  It  will  be  proved  that 

Zn->d  Z  =  arg  sup  £oo(z)  =  arg  inf  -<2oo(z), 
where  Qao(z)  is  defined  in  the  proof  of  Theorem  3.1. 

Lemma  G.l  may  be  verified  by  checking  three  conditions: 

(a)  epi-convergence  in  distribution  of  —  Qn  to  its  finite-dimensional  limit  — Qoo, 

(b)  Zn  =  0„(1),  and 

(c)  uniqueness  of  Z. 

Conditions  (c)  is  assumed.  Condition  (b)  is  shown  below.  It  is  more  difficult  to  prove  (a).  The  general  idea 
of  the  proof  is  borrowed  from  Knight  (2000) 's  proof  of  his  Theorem  2.  The  specifics  are  based  on  bounding 
two  types  of  modulus  of  continuity  by  a  strategy  that  is  similar  to  the  one  in  Ibragimov  and  Has'minskii 
(1981a). 

Definition  of  epi-convergence  in  (G.l)  consists  of  parts  (l)-(3).  We  verify  part  (1)  only,  part  (3)  follows 
almost  identically,  and  part  (2)  holds  trivially  (by  definition  of  lower-semi-continuity.)  For  notation  sake,  in 
what  follows  we  do  not  index  P  by  7n(5). 

Given  a  collection  of  rectangles  Ri,...,  Rk,  write 

Pi  inf  -Qn(z)  >  n,...,  inf  -Qn(z)  >  rk\  =  1  -  p{u,<fc{  inf  -Qn(z)  <  rj}\. 

K  z£R]  z€.Rk  J  I.  z£Rj  ) 

Thus,  to  verify  (1)  in  (G.l)  it  suffices  to  show 

limsupP{u7<t{  inf  -Q„(z)  <  rj}\  <  P\uj<k{  inf  -Qoo(z)  <  rj}\.  (G.2) 

To  explain  the  result  clearly,  first  bound  the  probability  of  the  event 

{  inf  -Qn(z)  <r}  =  {  inf  (-Ql(z)  -  Qdn{u))  <  r\. 

Denote  R  =  R^  x  R" .  Define  two  sets  of  grid-points  as  follows. 

Consider  the  grid  of  equidistant  points  {vs}  and  {um}  inside  the  rectangles  Ra  and  R0  such  that  sup- 
distance  between  the  adjacent  points  is  at  most  tp.  Also  cover  R0  by  the  sets  V,*,  as  defined  in  the  proof 
of  Lemma  G.2  where  and  let  Ukj  denote  a  carefully  chosen  point  inside  the  cover  set  Vjy,  as  defined  in  the 
proof  of  Lemma  G.2. 

Next,  define  collection  of  points  {zi}  as  the  Cartesian  product  {vs}  x  ({wm}  U  {«*j})-  This  collection  of 
grid  points  has  the  property  that  the  nearest  grid-points  are  at  most  <p  apart  from  each  other.  The  collection 
of  {zi}  will  be  used  to  approximate  the  infz6fl  —  Qcn(z)  by  inf^j.,}  —  Qcn(z).  The  collection  of  {ukj}  will  be 
used  to  approximate  the  behavior  of  infueH^  —Qn(u)  by  infu6{ut  }  —  Q„(u). 
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Then 

\mf -Qn(z)<r\c\\    inf    -Qcn{z)  +      inf     -Qdn(u)  <  r  +  e\  n  { a}\   U  Ac ,  (G.3) 

where  A  is  the  event  that  the  finite-dimensional  approximation  "works"  and  Ac  is  its  complement,  that  is 

A  =  {»g;  (R,  <p)  <  e,  iQi  (R,  <p)  =  0} , 

where  wQ^(R,<p)  and  Ld  (R,<p)  are  the  moduli  of  continuity  of  the  continuous  part  Qcn  and  discontinuous 
part  Qi,  respectively: 

WQ'„(R><P)  =  SUP  l<3n(2l)  -  Qn(22)l. 

Il,^2e'?i|z]-Z2l<V 

^(fl,rf  =  l    inf  -<#(«)<        inf        -Qi(u)\. 

The  modulus  WQ^(R,(p)  is  a  standard  measure  of  equicontinuity  of  Qcn.  The  modulus  £Qj(R,<p)  is  a 
Skorohod-type  modulus,  it  tells  whether  the  infimum  of  the  step  function  —  Qi(z)  coincides  with  the  mini- 
mum of  —  Qn(z)  computed  over  a  finite  set  of  grid  points  {zi}- 

Lemma  G.3  bounds  the  probability  of  Ac: 

P{wQ'n  (R,ip)  >  c\  <  const  -\R\  -£~l<p,     P{(Qd  (R,¥>)  =  l}  <  const  ■  \R\  ■  tp,  (G.5) 

where  \R\  =  sup{|z|  :  z  e  R}.  For  any  e  >  0  and  given  R,  we  can  pick  ip(e)  small  enough  such  that  the  rhs 
of  (G.5)  is  smaller  than  e/2. 

Hence 

P{  inf-Q„(z)  <r\<p{    inf    -Qcn(z)  +      inf     -Qdn(u)  <  r  +  e\+e, 

and  by  Theorem  3.1  and  the  Portmanteu  Lemma 

limsupPJinf  -Qn{z)  <  r)  <  p{    inf    -Qcn(z)  +     inf     -Q?,(z)  <  r  +  c\  +  e 

<  p{   inf    -QK(j)  <  r  +  e}  +  e. 
Since  £  >  0  is  arbitrary 

limsupPJ  inf  -Qn(z)  <  r\  <  p\  inf  -Qx(z)  <  A. 

Thus  for  UfP)  C  R,  it  follows  that 

limsupP{u*=1{   inf  -Q„(z)  <r;}|  <  P{u*=i{     inf    -Qoo{z)  <  r, 1  +  e}\  +  e. 

Since  e  is  arbitrary,  the  required  conclusion  (G.2)  follows. 

Finally,  it  remains  to  establish  Zn  =  Op(l).  First,  the  MLE  7  is  consistent  by  a  generalization  of  Wald's 
Theorem  on  Consistency  of  MLE  -  Theorem  3.3  of  Artstein  and  Wets  (1995)  -  which  requires 

(a)  7  >-»  —  ln/(V,  —  g(Xi, /3)|X,,7)  is  a.s.  lower  semi-continuous  (which  is  true  by  C2  and  C3), 

(b)  the  domination:  sup^EPl  supT,  ln/(K,  -  g(Xi,P')\Xi,*j')  <  In/  <  +oo,  which  is  true  by  C2, 

(c)  identification  (by  CO), 

(d)  compact  parameter  space  (by  CO). 

13 


Consistency  implies  wp  — >  1  Zn  €  Sn  =  {z  :  |u|/%A*  <  £n,  \u\/n  <  en}  for  some  e„  ->  0;  which  aJlows  the 
use  of  inequalities  (G.6)  and  (G.7)  proved  in  Lemma  G.3,  along  with  the  exponential  inequality  for  El]/  (z) 
given  in  Lemma  C.2.  These  inequalities  imply,  by  a  standard  argument  like  that  on  p.  265  in  Ibragimov 
and  Has'minskii  (1981a)  [see  Lemma  G.4],  that  for  sufficiently  large  A  >  0  and  any  N  >  0 

p{       sup       en(z)  >A~N\  <CNA~N,  where  Ov  >  0. 

Hence  P{\Zn\  >  A}  <  CNA~N ,  and  it  follows  that  Zn  =  Op(l).  ■ 

Lemma  G.2  (Lipschitz  Continuity  of  Qcn).  Under  C0-C5,  for  a  small  8  >  0,  there  is  a  random  variable 
Cn  >  0  such  that  for  all  n  >  no  and  some  large  no 

\Ql(zi)  -<?.»(*i)|  <  Cn\zi  -  z2\{\zi |+1),  sup  EP^Cn  <oo, 

i>no,7efl<(7o) 

uniformly  over  all  \z\  —  zi\  <  1  in  the  set  Sn  =  {z  :  lul/v^  <  £n,  |«|/"  <  £«},  tuftere  £„  -4  0. 

Lemma  G.3  (Bounding  Moduli  of  Continuity).  Under  C0-C5,  for  all  n  :  N  >  no,  where  no  is 
sufficiently  large,  for  some  5  >  0,  and  any  bounded  rectangle  R  C  Sn: 


1.  For  all  sufficiently  small  <p  >  0  and  e  >  0 

su 

2.  For  all  sufficiently  small  ip  >  0 


sup     P7(wQ^(R,tp)  >  e)  <  const  •  \R\    e    V.  (G  61 

76B4<-ro)  v     '    ' 


sup     P7(£0,i  (P,  tp)  =  1)  <  const  ■  |i?|  ■  vs,  /G  ^ 

where  \R\  =  sup{|z|  :  z  G  fl},  and  tziQ^  (■)  and  £Qd  (■)  are  defined  in  the  proof  of  Theorem  G.l. 

Lemma  G.4  (Tail  Bound).    Under  C0-C5,  for  sufficiently  large  A  >  0  and  any  N  >  0 

P7{       sup       en(z)  >  A~N\  <  CNA~N,  where  CN  >  0. 

uniformly  in  7  €  .8,5 (70)  for  some  8  >  0. 

G.3.  Proof  of  Lemma  G.2(short  proof).  The  Lemma  G.2  is  needed  for  the  MLE  part  only.  The  result 
follows  by  a  standard  empirical  process  argument,  noting  that  the  object  of  interest  is  a  function  that  is  an 
average  and  that  is  a  spline  type  object.  The  result  then  follows  by  the  Taylor-like  expansion  and  obtaining 
expressions  of  the  from  Cn(ei,Zi ,  Z2)|zi  —  £2|||zi|  +  1|,  and  finally  applying  a  maximal  moment  inequality 
to  the  coefficients  Cn(u,zi,Z2),  specifically  Lemma  19.34  in  van  der  Vaart  (1999).  [Details  are  given  in  the 
last  section  of  this  document.]  ■ 

G.4.  Proof  of  Lemma  G.3.  The  first  part  follows  by  the  Markov  inequality  and  Lemma  G.2.  The  second 
part  is  proven  below.  In  the  one-dimensional  case  with  no  covariates,  the  argument  essentially  reduces  to 
the  proof  given  on  p. 262  in  Ibragimov  and  Has'minskii  (1981a). 

(a)  [Covering  Sets.]  For  a  hyper-cube  R  =  Ra  x  R8 ,  where  R"  C  Rd"  and  R0  C  Rd",  construct  a 
collection  of  (possibly  overlapping)  subsets  {Vkj}  of  R8  as  follows.  First  cover  the  support  of  vector  A(X) 
by  the  minimal  number  of  closed  equal-sized  cubes  {Xj,j,j  <  J((f>)}  with  the  side-length  of  the  cube  equal 
to  <p  <  1.  There  are  J{4>)  <   const  (l/<f>)dB  such  cubes. 
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Recall  that 

dAn(X,u)  _  dg{x,P)\ 

du  d/3     \0=eo+u/n 

Note  also  that  uniformly  in  u  in  Rs  and  uniformly  in  Rr   C  {u  :  \\u\\/n  <  En}  for  some  given  sequence 

en  — >  0,  we  have 

dAn(X,u) 


du 

In  particular,  choose  no  such  that  for  all  n  >  no  (no  depends  on  <j>) 

I  dAn(X,u 


=  A(X)  +  o(l). 
depends  on  <j>) 

A(X)|  <  (j>7  a.s. 


I        du 
Thus,  for  any  given  x  and  any  R0 ,  and  given  that  A(x)  G  X^,;,  we  have  that 

Uuefl" "^""^ —    belongs  at  most  to  K"  =  2  "  cubes  of  the  form  X^j/  that  are  adjacent  to  X^0.     (G.8) 

(We  only  need  that  K"  is  finite  and  is  independent  of  <f>  and  R0).  Thus,  in  what  follows  it  is  helpful  to  think 
of  aAQ^1'"'  as  being  equal  A(x)  and  independent  from  u. 

Construct  the  (overlapping)  sets21 

{vk],k  =  -m,...,m,j  =  l,...,J{<l>)}cRde 
such  that 

Vkj  =  -j  u  G  R  "  '■  vk  —  ip  <  An(x,u)  <  t^  +  ip  for  all  n  >  no  and  all  x  s.t.  A(x)  G  X^.j  k 
where  ip  >  0  and 

vk  =  fci^,  for  k  G  {— m,  ...,0,...m}. 

Since  the  range  of  |An(X, u)\  is  bounded  a.s.    by  p||i?^||  for  all  n,  we  can  cover  the  range  by  2m  -I-  1 
brackets  of  the  form  [vk  —  ip,vk  +  <p]  where  m  <   const  \Rg\/<p,     \RS\  =  sup{|u|  :  u  G  Re}.  Choose 

<t>  oc  </l\Re\  (G.9) 

for  all  small  <p.  Hence  the  total  number  L  of  covering  sets  Vkj  is  bounded  as  L  <  (2m  +  1)  •  J(<t>)  and  grows 
at  most  polynomially  in  \R0\  and  in  1/V-22 

Next,  construct  the  "centers"  ukj  in  Vkj  n  Rs  so  that  for  all  n  >  no  23 

hj  <  A„(x,utj-)    <Skj+V,    Vi:A(i)eX^  (G.10) 

where 

5,.-  =       inf      A„(x,u)  where  inf  is  taken  over  u  6  Vkj  PI  i?    and  x  :  A(x)  6  X^  .-. 

We  will  need  that  n  :  0  <  77  <<  if,  i.e.  that  n  is  sufficiently  small  relative  to  tp.  Moreover,  in  order  to 
satisfy  the  constraint  in  (G.10),  we  need  to  have  <t>  set  sufficiently  small  as  well.  Setting  <f>  small  restricts  the 
variation  of  A(x)  and  hence  of  dAn(x,Uk,)ldu  at  most  to  const  •  4>  when  x  :  A(x)  G  X^j.  Thus,  we  choose 
77  as  n  oc  <p2  and  0  as  stated  in  (G.9). 


21The  covering  sets  Vkj  can  be  thought  of  as  "approximate  linear  subspaces"  of  K  <*. 

22Note  also  that  Vkj  clearly  cover  R&  for  n  >  no,  because  given  u  we  have  An(x,u)  belongs  to  at  least  two  different 
brackets  of  the  form  [vk  -  <P,vk  +  <p]  for  all  n  >  no,  and  A(x)  G  X^  for  some  X^j  that  is  at  most  02  away  from 
dA„(i,u)/du  for  all  n  >  no-  Hence  u  G  Vkj  for  some  k  and  j. 

23Note  that  for  Vkj  in  the  interior  of  R? ,  it  is  the  case  that  6^,  =  vk  —  <p;  but  otherwise,  this  is  not  the  case. 
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(b)  [Characterization  of  Break-Points]  Recall  that 
l(Xi 


Qt{u)=Y.    •ni7^{l(0<ne,  <  An(Xi7u))     +^ 


n      r 


1=1 


\n^\l(0>nei>An(X„u)) 
q  (A,) 


We  next  examine  the  nature  of  the  discontinuities  of  Qt(u)  by  first  examining  those  of  Qi.+  (u)  and  then 
those  of  Qi~(u). 

Suppose  we  have  nti  =  A„(A'j,u)  for  some  u  £  Vkj  and  A(A";)  6  X^,j,  then  the  pair  (ne,, Xi)  is  said  to 
induce  a  break-point  in  the  set  Vkj  and  in  the  bracket  \vk  —  ip,  vk  +  <p]  to  which  A„(X,, u)  belongs.24 

Given  that  this  is  the  only  pair  that  induces  a  break-point  in  Vkj  it  follows  that 
inf       -Qt+(u)  #  -Qdn+(ukj)     only  if    net  £  [5kj ,  6kj  +  n] 

since  —  Q^+(u)  is  piecewise-constant  and  can  only  jump  up  if  the  index  A„(A',-,  v.)  increases. 

Thus,  what  we  need  is  as  follows.  First,  we  need  to  control  the  probability  of  the  event  that  more 
than  one  break-point  happens  in  any  of  the  brackets  of  the  form  [vk  —  ip,vk  +  tp]  for  |fc|  <  m.  This  is 
included  in  the  event  that  the  errors  net  are  not  separated  in  non-overlapping  brackets,  which  is  the  event 
A\(R)  =  Ujt<77i{  there  are  nei,nei'  e  [vk  —  <p,yk  +  <p]}-  Second,  we  need  to  control  the  probability  that  for  all 
nei  that  are  separated  into  the  brackets  \vk  —  <p,vk  +  tp],  they  do  not  fall  into  the  "bad  subset"  \5k  -,  Sk  •  +  rj\ 
of  such  brackets,  given  that  A(Xj)  G  X^j.  Formally,  conditionally  on  the  complement  of  A\(R),  i.e.  on 
A\(R)  define  the  event  A2 (R)  as  the  union  of 

A2i,kj(R)  =  [nei  6  [Skj,Skj  +  f?]|ne.  6  [vk  -  f,vk  +  ip],A{Xi)  6  X^-.u  6  Vk}j 
across  i  <  n,  \k\  <  m,j  <  J(4>). 
To  begin, 

71  n 

p{a1(r)]  <  J2   J2  x^p{ne"ne' e  fc*  -  *>** + fi}  ^  (2m + *)  ■  <2-^2  ^  const  ifliv- 

|k|<m  i'=l:i'£i  i=l 

Denote  the  total  number  of  rce,  that  fall  into  brackets  of  the  form  [vk  —  ip,  vk  +  tp]  by  Afn .  Because  (i)  any 
bracket  [vk  —  <p,  vk  +  <p]  overlaps  with  at  most  two  other  brackets  and  (ii)  (G.8)  holds  for  n  >  no,  there  are 
at  most  3  •  K"  ■  jV„  neighborhoods  of  the  form  Vkj  in  which  the  break-point  may  occur  (where  K"  is  defined 
in  (G.8)).  Then 

p\A2(R)\tfn,  Al(R)}  <  3  ■  K'  ■  Nn  ■  sup  p{A2i,j,k(R)\  <  3  •  K'  ■  Mn  ■  (///)  ■  fo/(2p)). 

i<»,l*l<mj'<J») 

Since  F{M,}  <  nEl{\nu\  <  g'\\R\\)  <  2g'\\R\\f, 

p{a2(R)\AUR)}  <   const  \R\(nM. 
Hence  (since  n  oc  <p2) 

P{  Ukj  {Jnf   -Qt+(u)  #  -Q*+  («,-»)}}  <  P{A2(J?)|A;(iJ)}  +  p{^(fi)} 

<   const  \R\(r)/tp  +  ip)  <  const  \R\<p. 
The  terminology  "break-points"  is  borrowed  from  the  linear  programming  literature. 
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Therefore,  conclude  that 

J  =  p\    inf   -Qdn+{u)  <    inf   -Qi+{ukj) }  <   const  \R\<p. 

UtH"  Ukj}  > 

Likewise,  it  follows  that  for  a  finite  collection  of  grid  points  {Ukj} 

II  =  P{    inf    -Qi~(u)<    inf   -Qdn{ukj)\<   const  \R\<p. 

Finally, 

P\  inf    -Qdn{u)  <         inf         -Q*(u))  <  J  +  JJ  <    const  |#|¥>.       ■ 

'•u£R0  ^H^kj.^kj}  > 

G.5.  Proof  of  Lemma  G.4.  This  uses  the  method  of  the  proof  on  p. 265-266  of  Ibragimov  and  Has'minskii 
(1981a).  We  need  to  establish  for  sufficiently  large  A  >  1  and  any  N  >  0 

PJ       sup       en(z)>A-"\  <CNA~N.  {GU) 

where  Cn  denotes  a  generic  constant  that  only  depends  on  N.   Let  R(t)  =  {z  :  t  <  \z\  <  t  +  1}.    It  will 
suffice  to  show  that  for  sufficiently  large  t  >  A 

PJ       sup       en(z)>i-N}<CNi~N,  (G.12) 

since  then 

oo 

PJ       sup       £n(z)  >  A~"\  <  VP-rj         sup         £„(z)>  (i  +  f)"""1}  <CnA",v. 

*•  z£S„:\z\>A  '  f^  L:6S(/l+t)nSn  J 

Next  cover  R(t)  by  grid-points  {zi}  in  the  way  defined  in  the  proof  of  Theorem  G.l.  It  follows  that 

pJ       sup      tn(z)>rN\<Py\    sup  £n(z)>rN /2}  +  pJwQCJR(t), <p)  >\\n(rN/2)\UZQ*  (R(t),<p)  =  l\, 
^:eR(t)nsn  >  L*e{*i}  J      v_!; 2 i, 

' '  // 

where  ujqc   and  £0,i  are  the  moduli  of  continuity  defined  in  (G.4). 

The  number  of  partition  points  {zj}  is  bounded  by  L  <   const  ■  (\R(t)\/<p)K,  where  \R(t)\  =  sup{|z|  :  z  6 
i?(t)}  =  t  +  1,  that  is  L  <   const  (2  +  l)"^-",  where  1  <  k.  <  oo  (n  is  given  in  the  proof  of  Lemma  G.2). 

By  Lemma  C.2  and  Markov  inequality,  Py{(n(zi)  >  <~'v/2}  <   const  t,v  exp(— b\t\).  Hence  for  t  >  A 

I<   const   •  (t  +  1) V""  •  t'v  -e"6"1.     6>0  (G.13) 

By  Lemma  G.3  noting  that  \R(t)\  =  i  +  1 

Il<   const  (<  +  1)|  ln(rw/2)|"V+  const  (t  +  1)^.  (G.14) 

Select  v?  =  <"2Ar"\  then  (G.12)  immediately  follows  from  (G.13)  and  (G.14).  ■ 
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G.6.  Proof  of  Lemma  G.2(Detailed  proof).   We  have 

n 

Qcn{z)=^fin{z)    x[l(ej  >{A„(X,,M)/n}V0)  +  l(ei  <  {An(X„u)/n}  A  0)] 
1=1 

+  ^(f"'(z)  -  r'"(z))  x  l^0  <  e<  ^  A„(Xi,u)/n)  +  1(0  >  et  >  A„(Xi,u)/n)] . 

i=i 
" « ' 

Recall  that  Q|n  (z)  =  0  in  the  one-sided  models. 

Intuitively,  note  that  the  functions  of  interest  are  all  Lipschitz-smooth  (spline-type)  objects  by  construc- 
tion, given  the  differentiability  assumptions  in  CI-  C3.  Thus,  it  is  reasonable  to  expect  the  final  result: 

Under  C1-C5,  for  a  small  5  >  0,  there  is  a  random  variable  Cn  >  0  such  that  for  all  n  >  no  and  some 
large  no 

\<&M-QU*2)\<Cn\Zl-Z2\QZl\  +  l),  SUp  EP_,Cn   <OC, 

7i>no,76Bj(7o) 

uniformly  over  all  \zi  —  z2\  <  1  in  the  set  Sn  =  {z  :  \v\/*Jn  <  e„,  \u\/n  <  e„},  where  en  — >  0. 

The  proof  is  tedious  but  it  does  have  a  very  simple  structure.  Given  some  careful  Taylor-type  expansions, 
the  maximal  inequalities  will  be  applied  to  the  coefficients  of  those  expansions  to  obtain  the  required  result. 

Split  Qc\n{zi)  —  Qi„(z2)  into  two  terms 

n 

7  =  Y,f'n  (*i)l(ei  >  {A„  (Xi,B,)/n}V0)  -  r,n  (z2)  1  (et  >  {An  (X„u2) /n]  V  0) , 

1=1 

n 

II  =  Y^f,n  (zi)  1  (e,  <  {A„  (X„Ui)/n}  A  0)  -  f,n  (z2)  1  (a  <  {An  (X,,u2)  /n}  A  0) . 
i=i 

We  focus  on  term  I,  and  only  briefly  indicate  the  differences  for  term  II.  The  term  I  can  bounded  as 


h  -  h  <  I  <  h  +  h, 


where 


/,  =J^l(ei  >{A„(I„u1)/t!}V{A„(X1,i12)/ji}v0) 

i=i 
x  (In/  (e;  —  An(Xi,u\) ln\Xi\po  +  ui/n,ao  +  v\/y/n) 

-  In/  (e,  -  A„  (X,,u2)  /n\Xi; p0  +  u2/n,  ao  +  v2/y/n)) 

n 

72  =  ^l(0<et  e  [An  (Xi,ui)  /n,  An(Xuu2)  /n]) 
i=i 

I      /  (ei  -  A„  (X,,uj)  /n\Xi;p0  +  Uj/n,a0  +  Vj/\/n)  I 

x  max    in —. — r— — r 

j  =  i.2  I  /(e,|X,;/30,ao)  I 

where  [a,  6]  denotes  {i :  a  <  i  <  6  or  i)  <  i  <  a}. 
Analogously  approximate  the  term  77  as  follows: 

77,  -  772  <  77  <  7/i  +  772, 
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where 

Tl 

IIj  =^1(e'  <  {A„(A",,w,)/n}  A{A„(A'„b2)/ii}AO) 

x  (In/  (e,  -  A„  (A,,  ui)/7j|A;;/30  +  Ui/n,a0  +Vi/Vn) 
-In/  (e,  -  An(Xi,Ut)  /n\Xi;/30  +  u2/n,a0  +  v2/%fn)) 

n 

J/2  =  ][]  1(0  >  e,  e  [An(A„u,)/n,An(X,,U2)/n]) 
i=i 

I.     /(e,  -  A„  (Aj.Mjj/nlA.jjgo  +  Uj/n,a0  +  Vj/-fn)  \ 
x  max    in  — ;    — —     -         - 

j=J.»l  /(e,|A,;/3o,ao)  I 

where  [a,  fc]  denotes  {x:a<x<b  or  6<i<  a}. 

Part  I.  Bounds  on  Terms  I\  and  II\.  Term  7i  equals  by  Taylor  expansion 

n 

^  1  (f,  >  {A„  (A,lUl)/n}  V  {A„  (X„u2)/n}  V  0) 
i=i 

x  — ln/(e,  -  A„  (A,,u*)/n|A,;/30  +  u"/n,a0  +  v*  j-Jri)  (u\  -  u2)/n  + 
ou 


In 

da 


$S(e,-  >  {An(X„ui)/n}V{An(Xl,u2)/n}V0)-^\nf((,\X,,lo)(v1  -«s)/Vn- 
i=i 


£)l(*  >  {A^A.^O/nlViAnCA,,^)/^}  V0) 


i=i 


92  w"  u" 

x(H„z")'^-5-rln/(e,  -  A„  (A,,u") /n|A,;/30  + ,<*o  +  -7=)  (ui  -  uz)  /%/«. 

o-yoa  n  ^/n 


Analogously  decompose  the  term  II\  as  77  n  + 1  hi  +  U\z- 


J2  1  («>  <  {A«  (A„Ul)  /n}  A  {An  (A„u2)  /n}  A  0) 

1=1 
O 

x  — ln/(e,  -  An  ( A, ,  u" )  /n\ Ar,  0o  +  u' /n,a0  +v"/y/n)  (ui  -  u2)/n  + 


"  <9 

£l(e,<  {A„(A„Ul)/«}  A{An(A,,U2)/n}  A  0)  —  In  /  (e,|A,,7o)  (u,  -^2)/^  + 


]£l(e.  <  {A7I(A,,w,)/n}  A{A„(A,,u2)/n}A0) 
1=1 

*(Hnz')'J!—\nf(e,  -  A„  (A,,0/"|A,,,3o  +  —  ,a0  +  ^=)  (w,  -  w)  /VS. 
070a  n  -Jn 
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By  C5,  the  term  |Jji  +  II\\\  is  bounded  by 


(I  £  VC2(e„X,))  |UI  -  ual,  (G.15) 


where  the  expectation  of  the  random  term  is  finite  and  constant  across  n  by  iid  sampling. 
By  C5,  the  term  Ii3  +  II\3  is  bounded  by 


n 

i  =  l 

where  the  expectation  of  the  random  term  is  constant  for  all  n. 
Write 

■Jn  *— '  oa 
1=1 

" v ' 

Jill 


1    n 
z^\  +  l){z,  -  z2)\-Y^C3{u,Xi),  (G.16) 


-4=y"l(0<eJe[A„(X,-,ti,)/n,An(Xi,ti2)/n])^-ln/(ej|Jfi,To)(t;, -w2) 

v     1=1 


(G.17) 


;/72 


-  -7=  Y"  1(0  >  £i  6  [A„  (A-j.m,)  /n,  A„  (Xj;«2)  /n])^-ln/  (e,|X„7o)(vi  -  u2)| 
vn ■ — '  Oa 

i=i 
»  ■* 

By  C5  J//i  has  two  finite  moments,  which  remain  constant  for  all  n.   Next  we  show  a  bound  for  llli 
and  the  same  bound  for  III3  follows  identically.  By  Lemma  19.34  in  van  der  Vaart  (1999) 

E  sup  \IIh  -  EIII2\  <  J{]{F,F,L2(P))  <  oo,  (G  18) 

I*ll/n+|ii2l/n<2€„  v     '      ; 

where  J[\(F, T,  L%(P ))  is  the  L2(P)  bracketing  entropy  of  the  function  class,  which  we  rewrite  in  terms  of 
original  parameter 

F={l(g(X„,30)  <  Y,  e  [s(X„/31),9(X,-,/92)])^-ln/(e1|X),7o),  |£  -  fa\  +  \02  -  #>|  <  2e„} 

with  the  constant  enveloped  =  /'//I  by  C3,  where  1  is  the  vector  of  ones.  Note  that  the  bound  \ui—  U2\fn  < 
e„  eventually  puts  /?o  +  ttj /n's  in  a  small  fixed  neighborhood  of  /?o  for  n  >  no,  and  also  puts  A„  (X{ ,  Uj )  /n's 
in  any  small  fixed  neighborhood  of  0. 

The  entropy  in  (G.18)  is  finite  uniformly  in  n  by  a  standard  argument,  because  T  is  formed  as  a  product 
of  a  Donsker  class 

{l(p(X,,/?o)  <  Yi  6  [g(X„p1),g(X„02)]),    |/3,  -  fa\  +  \fo  ~  Po\  <  2e„} 

(see  type  V  functions  in  Andrews  (1994))  and  a  bounded  by  F  random  variable,  which  by  Theorem  2.10.6 
in  van  der  Vaart  and  Wellner  (1996)  implies  that  T  is  Donsker,  and  thus  J[)(F, T,  L2(P))  <  oo. 
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Also  note  that  by  C2  and  C3: 

\EIIh\  =  E^\  lfe.f'7JI      /(e|X„70)|l(e  >  0)de 

r  An  (X^  ,112  )/n 

<  E^fn  I  /'l(e  >  0)<7e  <  g  ■  f  ■  \\u,  -  u2\\/y/n, 

/i,(x„ii,)/» 

where  the  constants  are  defined  in  Lemma  A.l. 

Thus,  for  some  random  variable  C„  with  bounded  expectation  uniformly  in  n 

|Ii2+//h|  <  Cn\vi  -v2\. 

Now  collecting  all  the  bounds  established  so  far,  we  have  for  some  random  variable  Cn  with  bounded 
expectation  uniformly  in  n 

|/l  +IIl\   <Cn\zi   -Z2\(\Zl\  +  l).  (G.19) 

Part  II.  Bounds  on  Terms  72  and  772.  Let's  get  back  now  to  the  terms  72  and  772.  Recall  that 

n 

72=^1(0  <e.  e  [A„(A',,M,)/n,An(X,,«2)/n]) 
>=i 

I,     f  ((,  -  A„  (Xj,Uj)  /n\Xj;Po  +  Uj/n,ao  +  Vj/\/n)\ 

x  max    in  , 

j  =  i.2  I  /(e,|A,;/3o,Qo)  I 

and  that 

n 

772=^l(0>f,  6  [A„(X,,Ul)/n,A„(A,,«2)/n]) 
t=i 

/(ei  -  An  (X,,Uj)  /n\X,;0o  +  uj/n,a0  +  Vj/y/n)\ 


x  max    In- 
j  =  i.2  1  /(ei|Ai;/?o,ao) 


By  C2-C3  72  is  bounded  by 

i^  1(0  <  e,-  e  [A„  (X\,u,)  /n,  A„  (A„U2)  /n])(/'//)  x||«,  -  «2||, 


'2 

where  the  constants  are  defined  in  Lemma  A.l. 


By  C2-C3  772  is  bounded  by 

-J*  1(0  >  e,-  6  [A„  (Xi,ui)ln,  A„  (A„  «2)  /n])(f'/f)  x||Ul  -  «2||, 

7J.    < — "  — 


7/21 

where  the  constants  are  defined  in  Lemma  A.l.    Then,  by  an  argument  that  is  identical  to  the  proof  of 
inequality  (G.18)  we  obtain  that  uniformly  in  n 

E  sup  \y/n(hi  -  EI2t)\  <  co, 

"1  ."2 

and  identical  bound  follows  for  the  term  772i: 

E  sup  \y/n(II2i  -  £772i ) |  <  oo. 

U],T12 

Furthermore  by  C2-  C3  7J(72i  +772])  <  const  •  |m  -  u2|. 
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Hence  for  a  random  variable  Cn  with  uniformly  bounded  expectation  (uniformly  in  n  and  in  7) 
II2  +  /I2I  <  U21  +II21I  <  const(l  +  Cn/Vn)\ui  -  u2\. 

Combining  this  inequality  with  the  one  in  (G.19),  the  bound  in  the  statement  of  the  lemma  follows: 
For  a  small  5  >  0,  there  is  a  random  variable  Cn  >  0  such  that  for  all  n  >  no  and  some  large  no 

IQi.(«i) - Q?»(«>)l  <  cr»|*i  -  a|(|a|  + 1),  sup       EPlcn<oo, 

™>no."r€B«(7o) 
uniformly  over  all  \z\  —  zi\  <  1  in  the  set  5„  =  {z  :  \v \/y/n  <  e„,  \u\/n  <  e„},  where  c„  — >  0. 

Similarly  it  follows  that  for  a  small  5  >  0,  there  is  a  random  variable  Cn  >  0  such  that  for  all  n  >  no  and 
some  large  no 

IQLfcO-QLMI   <Cn|zi   -22|(|2l|  +  l),  SUP  EP^Cn<00, 

n>no.T6Si(7o) 

uniformly  over  all  \z\  —  zi\  <  1  in  the  set  5n  =  {z  :  \v\/y/n  <  en,|tt|/7i  <  £„},  where  e„  — »  0.   Note  that 
C„  =  0  in  the  one-side  models.  ■ 
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